Incorporating real-time audio on the Web.


Mosaic and the World Wide Web have seen explosive growth over the past year, with an increasing number of users connecting to the web via dial-up Internet connections. These users, with a typical bandwidth of 9600 bits per second, are unable to take advantage of many of the Web's services, as the amount of information required for many transactions overwhelms their dial-up connections.

In current versions of Mosaic, audio files, like other types of files, are transferred from the server to the client in their entirety, then processed by the appropriate viewer. In the case of audio, a 30 second sound clip stored in mu-law format (the MIME audio/basic type) takes over three minutes to down-load to a dial-up user before the sound even begins to play. Longer audio files, such as a typical 30 minute interview on the Internet Talk Radio Network would take many hours to down-load, assuming enough local disk space and patience were available.


Mosaic-Interactive is an enhanced version of Mosaic that permits speech files of arbitrary length to be played over the Web in real time, instantly, even over dial-up lines. It incorporates high quality speech compression technology into Mosaic, permitting the transfer of speech data in real time. In addition, viewers can be started and begin operating as soon as the data transfer begins, instead of waiting for the document to be transferred to the client in its entirety.

The appropriate speech compression technology is critical to the successful implementation of real time audio on the Web. The most important constraint is the requirement to transfer data in real time over dial-up connections, which implies a compressed data rate less than 9600 bits per second. In addition to the bandwidth requirement, the uncompressed speech should be of high quality, and the decompresser must operate in real time on the wide variety of Web client machines. 'Mosaic-Interactive' uses TrueSpeech(r) speech compression technology to achieve the high quality and low bit rates required.

By spawning the audio viewer as soon as the data transfer begins, the user can listen to the speech document while the data is being transferred. Although a half hour audio document will still take a half hour to down-load, the user won't have to wait a half hour before listening, and they needn't reserve any disk space to hold the temporary files.

Although 'Mosiac-Interactive' allows a large class of Web users to take advantage of a significant new resource, real-time audio, the generic capabilities enabled by spawning the viewer as soon as the document transfer begins creates a whole new category of Web applications: interactive ones.

Most http servers support the Common Gateway Interface or CGI, which permits documents to be generated on-the-fly at the server by running a special program, or CGI script. When the CGI script started by the server is coupled with a 'Mosaic-Interactive' viewer, the program generating the document (the CGI script) can communicate directly with the program consuming the document (the viewer) and create a dynamic, interactive environment for the user.

For example, the TrueSpeech audio viewer can send commands back to the server. If the audio document is being supplied by the TrueSpeech CGI script, the audio playback can be paused, skipped forward or backwards. The 'Mosaic-Interactive' user, with a microphone, can even talk back to the server. The CGI script opens a window on the server machine, calls for a person, and enables a conversation in real time.


Mosaic has been enhanced to permit a viewer to act on a document immediately, without storing the document in a file first. In conjunction with high quality speech compression, this permits even dial-up Mosaic users to listen to speech documents in real time. When combined with the CGI capability of the http servers, this enhancement permits the client and server to operate in concert, providing powerful interactive capabilities on the Web.
Stephen Uhler -