In current versions of Mosaic, audio files, like other types of files, are transferred from the server to the client in their entirety, then processed by the appropriate viewer. In the case of audio, a 30 second sound clip stored in mu-law format (the MIME audio/basic type) takes over three minutes to down-load to a dial-up user before the sound even begins to play. Longer audio files, such as a typical 30 minute interview on the Internet Talk Radio Network would take many hours to down-load, assuming enough local disk space and patience were available.
The appropriate speech compression technology is critical to the successful implementation of real time audio on the Web. The most important constraint is the requirement to transfer data in real time over dial-up connections, which implies a compressed data rate less than 9600 bits per second. In addition to the bandwidth requirement, the uncompressed speech should be of high quality, and the decompresser must operate in real time on the wide variety of Web client machines. 'Mosaic-Interactive' uses TrueSpeech(r) speech compression technology to achieve the high quality and low bit rates required.
By spawning the audio viewer as soon as the data transfer begins, the user can listen to the speech document while the data is being transferred. Although a half hour audio document will still take a half hour to down-load, the user won't have to wait a half hour before listening, and they needn't reserve any disk space to hold the temporary files.
Although 'Mosiac-Interactive' allows a large class of Web users to take advantage of a significant new resource, real-time audio, the generic capabilities enabled by spawning the viewer as soon as the document transfer begins creates a whole new category of Web applications: interactive ones.
Most http servers support the Common Gateway Interface or CGI, which permits documents to be generated on-the-fly at the server by running a special program, or CGI script. When the CGI script started by the server is coupled with a 'Mosaic-Interactive' viewer, the program generating the document (the CGI script) can communicate directly with the program consuming the document (the viewer) and create a dynamic, interactive environment for the user.
For example, the TrueSpeech audio viewer can send commands back to the server. If the audio document is being supplied by the TrueSpeech CGI script, the audio playback can be paused, skipped forward or backwards. The 'Mosaic-Interactive' user, with a microphone, can even talk back to the server. The CGI script opens a window on the server machine, calls for a person, and enables a conversation in real time.