gradio Streaming Audio is choppy

trafficstars

Describe the bug

When streaming audio, the reconstructed signal of the streamed chunks sounds choppy.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

To reproduce, run:

import gradio as gr
import numpy as np


def run(audio, state):
    sr, data = audio
    if state is None:
        state = data
    else:
        state = np.concatenate([state, data])
    audio = sr, state
    return audio, state


gr.Interface(
    fn=run,
    inputs=[
        gr.Audio(source="microphone", type="numpy", streaming=True),
        "state"
    ],
    outputs=[
        "audio",
        "state"
    ],
    live=True,
).launch()

On your local machine and listen to the recorded audio. The problem also persists for the audio streaming demo: https://github.com/gradio-app/gradio/tree/main/demo/stream_audio

Screenshot

No response

Logs

System Info

Gradio 3.0.24
Ubuntu 20.04.4 LTS
Firefox 101.0.1 (64-bit)

Severity

blocking upgrade to latest gradio version

Jul 05 '22 14:07 yannickfunk

thx for reporting @yannickfunk, it seems like a bug, @aliabid94 could you take a look on what's going on?

Jul 05 '22 16:07 omerXfaruq

Similar issue: #1332

Jul 05 '22 16:07 omerXfaruq

Is there any progress here? This feature would be much appreciated

Jul 28 '22 14:07 yannickfunk

Hi @yannickfunk thanks for reporting the issue. We have not had a chance to look into this issue yet, but we'll take a closer look at this issue next week

Jul 28 '22 18:07 abidlabs

If you need some assistance here, I am willing to help!

Aug 29 '22 12:08 yannickfunk

Hi @abidlabs @FarukOzderim and @aliabid94. I did some research and found out a lot of stuff, I can give you some conclusions:

The choppy audio is coming from the start() and stop() of the MediaRecorder because this leads to artifacts in the audio. It is the most straightforward way to accomplish streaming audio, but for ML showcases it is suboptimal, because the resulting audio sounds choppy.

An implementation aligned with the MediaRecorder Api, would be to make use of the timeslice argument when calling recorder.start(timeslice). For a timeslice of 500, the mediarecorder would fire "dataavailable" every 500ms and provide the current chunk of captured audio. (See https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder)

Fires periodically each time timeslice milliseconds of media have been recorded (or when the entire media has been recorded, if timeslice wasn't specified). The event, of type BlobEvent, contains the recorded media in its data property.

The problem of this approach is that only the first yielded chunk contains the codec (or audio format) information, so subsequent chunks are useless without the information of the first chunk (i.e. cannot be saved as a valid wav file). Prepending the codec information for every chunk would be the obvious solution, but this is not possible for codecs like opus, since they have variable header lengths etc. (See https://stackoverflow.com/questions/48891897/send-chunks-from-mediarecorder-to-server-and-play-it-back-in-the-browser)

You can't just needle-drop into the WebM stream. WebM/Matroska require some setup to initialize the track info and what not. After that, you'll have Clusters, and you have to start on a Cluster. Additionally, Chrome is going to require that each Cluster start on a keyframe, which you're not going to be able to guarantee with the data from MediaRecorder. Therefore, server-side transcoding (or at least, some nasty hacking on the VP8 stream) is needed.

The cleanest solution here, would be to stream the chunks to the server and do the transcoding on the server side (without saving every audio chunk as a wav file).

I got a quick and dirty solution to work using the extendable-media-recorder package (See https://github.com/chrisguttandin/extendable-media-recorder). With this package you can ensure, that the chunks yielded by the media recorder are in PCM (WAV format). The WAV format has a fixed header of 44 Bytes and can then be manually prepended for every yielded chunk. Every chunk can then be converted to base64 and saved as a wav file.

TLDR: There are a few caveats and no obvious solution.

What do you think?

Sep 25 '22 18:09 yannickfunk

This is so helpful @yannickfunk! Trying it out now

Sep 26 '22 04:09 aliabid94

Hey @yannickfunk I've been trying to implement your suggestion and got a bit stuck, if you could help me out. You said you got a quick and dirty version working, could you share that code? I've posted my quick and dirty version that isn't producing a parseable base64 below. (the prepare_audio method is the only relevant part of the code I believe)

Also another question: why can't I use recorder = new MediaRecorder(stream, {mimeType: "audio/webm;codecs=pcm"}); instead of this extendable-media-recorder library?

Thanks for the help!

<html>

<body>
  <h1>test audio streamer</h1>
  <button onclick="record()">record</button>
  <button onclick="stop()">stop</button>
  <hr />
  <audio controls></audio>
  <script>
    let recorder;
    let audio_chunks = [];
    let player;
    let audio_blob;
    let inited = false;
    let recording = false;
    let pending = false;

    let last_chunk_index = 0;
    let header_chunk;

    function blob_to_data_url(blob) {
      return new Promise((fulfill, reject) => {
        let reader = new FileReader();
        reader.onerror = reject;
        reader.onload = () => fulfill(reader.result);
        reader.readAsDataURL(blob);
      });
    }

    async function post(data) {
      pending = true;
      await fetch("http://localhost:4000/stream", {
        method: "POST",
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          "data": data
        })
      }
      ).then(r => r.json()
      ).then(r => {
        document.querySelector("audio").src = r.value
      });
      pending = false;
    }

    async function prepare_audio() {
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      recorder = new MediaRecorder(stream, {mimeType: "audio/webm;codecs=pcm"});

      recorder.addEventListener("dataavailable", async (event) => {
        audio_chunks.push(event.data);
        if (!pending) {
          if (last_chunk_index === 0) {
            let first_chunk = await audio_chunks[0].arrayBuffer();
            header_chunk = first_chunk.slice(0, 44);
            var chunk_set = audio_chunks;
          } else {
            var chunk_set = [header_chunk].concat(audio_chunks.slice(last_chunk_index));
          }
          audio_blob = new Blob(chunk_set, { type: "audio/wav" });
          last_chunk_index = audio_chunks.length;
          value = await blob_to_data_url(audio_blob)
          post(value);

        }
        
      });
      inited = true;
    }

    async function record() {
      recording = true;
      audio_chunks = [];

      if (!inited) await prepare_audio();

      recorder.start(500);
    }

    const stop = () => {
      recorder.stop();
      recording = false;
    };
  </script>
</body>

</html>

Sep 26 '22 15:09 aliabid94

Oh I realize I'm still using webm with the PCM codec, I imagine that's causing the issue. Will try the library you recommended. (still do send your code please!)

Sep 26 '22 16:09 aliabid94

Please find my code here: https://gist.github.com/yannickfunk/f5724e9af72f2a87b07f04df015e6d66

Sep 26 '22 16:09 yannickfunk

Oh I realize I'm still using webm with the PCM codec, I imagine that's causing the issue. Will try the library you recommended. (still do send your code please!)

Yes I assume webm containers form clusters and you can only use the chunk as valid data, if it is the beginning of a cluster

Sep 26 '22 16:09 yannickfunk

@aliabid94 did you manage to get it to work?

Sep 27 '22 20:09 yannickfunk

Taking a look now, thanks @yannickfunk!

Sep 27 '22 22:09 aliabid94

Opened PR https://github.com/gradio-app/gradio/pull/2351, thanks so much @yannickfunk! Just had to tweak your code to support "pending", where the backend function isn't yet complete so we can't dispatch right away.

Sep 28 '22 01:09 aliabid94

gradio gradio copied to clipboard

Streaming Audio is choppy

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

Severity

gradio
gradio copied to clipboard