gradio icon indicating copy to clipboard operation
gradio copied to clipboard

Audio Streaming: large latency before first chunk is played

Open sanchit-gandhi opened this issue 1 year ago • 2 comments

We typically stream audio outputs when latency is a major consideration. E.g. if we're generating 10-seconds of audio and want the perceived latency to be as low as possible, we can stream the outputs in 1-second chunks, such that the user can start playing the audio 10x faster than if they waited for the full 10-second audio. Here's an example for Parler-TTS.

When using the Gradio streaming component, we typically have to wait 3-4 seconds after the first chunk is returned before the output starts playing. This fixed overhead negates the latency improvement we expect from streaming. The result is that it's very difficult to showcase streaming outputs using Gradio.

This Space demonstrates the issue in a MWE: https://huggingface.co/spaces/sanchit-gandhi/audio-streaming We have a 30-second audio, which we stream in 2-second chunks. It takes 1-second for the first chunk to be returned, but the audio only starts playing after an additional 3-4 seconds.

If we could reduce this to near zero additional overhead, it would make showcasing streaming outputs in Gradio much more feasible.

cc @aliabd @abidlabs @hannahblair @ylacombe

sanchit-gandhi avatar May 01 '24 13:05 sanchit-gandhi

Related to #8177, but the MWE demonstrates that the full audio does not need to be streamed, but rather there's a fixed lag after the first chunk is received

sanchit-gandhi avatar May 01 '24 14:05 sanchit-gandhi

Any luck with this @aliabd?

sanchit-gandhi avatar May 15 '24 08:05 sanchit-gandhi

Hey @sanchit-gandhi - taking a look at this and our audio streaming approach in general. I think there are things we can improve on the gradio side but why is there a time.sleep in the audio processing loop of your demo? If you remove it the first chunk starts playing after < 1 second. I think the browser won't play until a few chunks have been processed. Without the sleep the entire audio is processed in 1-2 seconds.

freddyaboulton avatar Jul 17 '24 08:07 freddyaboulton

Hi @freddyaboulton, thanks for taking a look into this!

I think the time.sleep was added to emulate processing time - say a model generating audio. In that case, the processing time - i.e half the chunk, i.e the sleeping time - is faster than real time generation of the audio. Ideally, we wouldn't have to wait for a few chunks to have been generated to start playing the audio, which is why @sanchit-gandhi opened the issue!

ylacombe avatar Jul 17 '24 11:07 ylacombe

Hey @freddyaboulton, have you been able to take a look at the above message and the audio streaming latency?

ylacombe avatar Jul 25 '24 09:07 ylacombe

Hi @ylacombe ! Sorry I did not get back to you earlier and thank you for providing more details. Yes I figured out the issue. The html <audio> tag expects a minimum amount of audio before autoplaying (~5 seconds). If you set the chunk length to 6 seconds in your demo, the browser will start autoplaying as soon as the first chunk is processed.

The solution is to use a different streaming implementation that gives us more control of when the browser starts playing video. Should have a PR for that open in the next day or two.

freddyaboulton avatar Jul 25 '24 16:07 freddyaboulton

Closed via https://github.com/gradio-app/gradio/pull/8906. If you'd like to try it out, you can install gradio from this branch: https://github.com/gradio-app/gradio/pull/8843

abidlabs avatar Jul 31 '24 22:07 abidlabs

Very great job! I have try the latest branch on #8843, The latency problem has been fixed already. But there seems to have some noise in the streaming audio now.

ZaymeShaw avatar Aug 01 '24 06:08 ZaymeShaw

Please share the full demo and audio file so that we can take a look!

freddyaboulton avatar Aug 01 '24 14:08 freddyaboulton

I met the same problem.However, even I use the #8906 source code to install gradio, the problem not was solved.There is still 3~4s delay and audio playing is not smooth(has some gap, look like lack of audio data).This is my demo code:

import gradio as gr
from pydub import AudioSegment
from time import sleep
import numpy as np
import datetime

audio_list = []
def add_to_stream(audio):
    sleep(0.05)
    global audio_list
    audio_list.append(audio)

with gr.Blocks() as demo:
    inp = gr.Audio(sources=["microphone"], streaming=True)
    inp.stream(add_to_stream, [inp], [])

    stream_as_file_btn = gr.Button("Stream as File")
    stream_as_file_output = gr.Audio(streaming=True)
    stream_as_file_output.autoplay = True

    def stream_file():
        global audio_list
        while True:
            while len(audio_list) == 0:
                print('stream out pull data, but no data available now...')
                sleep(0.05)
            chunk = audio_list[0]
            audio_list = audio_list[1:]
            print('yield audio chunk, samples: {}, cached audio chunks: {}, at: {}'.format(len(chunk[1]), len(audio_list), datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f")))
            yield chunk

    stream_as_file_btn.click(
        stream_file, [], stream_as_file_output
    )


if __name__ == "__main__":
    demo.launch(server_name='0.0.0.0', server_port=8000)

I figured audio data output speed via log, it's coincident with it's sample rate.

Demo usage: 1.click 'stream_as_file_btn' to start audio data fetching. 2.click 'inp' audio component's recording button to start generating audio data.

After about half an second, you will see 'yield audio chunk...', which means audio data beging outputing.

steven8274 avatar Aug 07 '24 06:08 steven8274

Same issue on my side, the audio chunks still accumulate for a few seconds before starting to play

ylacombe avatar Aug 07 '24 08:08 ylacombe

Same issue on my side, the audio chunks still accumulate for a few seconds before starting to play

Besides, the audio data seems to be comsumed too quick which make the audio playing always pause.

steven8274 avatar Aug 07 '24 08:08 steven8274

Just to confirm @ylacombe @steven8274 this is after installing gradio with:

pip install https://gradio-pypi-previews.s3.amazonaws.com/ea384210055da2b1e6a2919b9ee4f8f3e137fa81/gradio-4.40.0-py3-none-any.whl

and this happens consistently, with all recorded audio (or does it have to be a particular length, etc.)? cc @freddyaboulton

abidlabs avatar Aug 07 '24 16:08 abidlabs

Hey @abidlabs, it does happen after installing the right version. I've sent an example to @freddyaboulton: the first chunk is played almost right away but there's a big latency before the next chunks are played, even though they're available.

ylacombe avatar Aug 07 '24 19:08 ylacombe

Yes taking a look - @ylacombe 's issue has something to do with using very small chunk lengths

freddyaboulton avatar Aug 07 '24 21:08 freddyaboulton

Yes taking a look - @ylacombe 's issue has something to do with using very small chunk lengths

@freddyaboulton Hi,thanks for paying attention to my problem!In my case, I use microphone to generate recorded audio,which is 48Khz, and I received audio chunk with 24000 sample per stream callback in every half a second.Is this chunk length too small?Maybe you can try my demo code to check if the audio componet is working fine.

steven8274 avatar Aug 08 '24 01:08 steven8274

Hi @steven8274 ! I looked at your issue as well and I think it's a different cause. I'm still investigating but I will be tweaking this over the next couple of weeks and will share a new wheel link for you to try soon.

BTW we'll be making the stream callback frequency configurable in #8941

freddyaboulton avatar Aug 08 '24 16:08 freddyaboulton

Hi @steven8274 ! I looked at your issue as well and I think it's a different cause. I'm still investigating but I will be tweaking this over the next couple of weeks and will share a new wheel link for you to try soon.

BTW we'll be making the stream callback frequency configurable in #8941

Thank you very much!Waiting for your good news!

steven8274 avatar Aug 09 '24 01:08 steven8274