gradio icon indicating copy to clipboard operation
gradio copied to clipboard

Audio Streaming: large latency before first chunk is played

Open sanchit-gandhi opened this issue 2 months ago • 2 comments

We typically stream audio outputs when latency is a major consideration. E.g. if we're generating 10-seconds of audio and want the perceived latency to be as low as possible, we can stream the outputs in 1-second chunks, such that the user can start playing the audio 10x faster than if they waited for the full 10-second audio. Here's an example for Parler-TTS.

When using the Gradio streaming component, we typically have to wait 3-4 seconds after the first chunk is returned before the output starts playing. This fixed overhead negates the latency improvement we expect from streaming. The result is that it's very difficult to showcase streaming outputs using Gradio.

This Space demonstrates the issue in a MWE: https://huggingface.co/spaces/sanchit-gandhi/audio-streaming We have a 30-second audio, which we stream in 2-second chunks. It takes 1-second for the first chunk to be returned, but the audio only starts playing after an additional 3-4 seconds.

If we could reduce this to near zero additional overhead, it would make showcasing streaming outputs in Gradio much more feasible.

cc @aliabd @abidlabs @hannahblair @ylacombe

sanchit-gandhi avatar May 01 '24 13:05 sanchit-gandhi

Related to #8177, but the MWE demonstrates that the full audio does not need to be streamed, but rather there's a fixed lag after the first chunk is received

sanchit-gandhi avatar May 01 '24 14:05 sanchit-gandhi

Any luck with this @aliabd?

sanchit-gandhi avatar May 15 '24 08:05 sanchit-gandhi