deepgram-js-sdk icon indicating copy to clipboard operation
deepgram-js-sdk copied to clipboard

AbstractLiveClient send() is async without the ability to await

Open tobowers opened this issue 9 months ago • 4 comments

What is the current behavior?

Currently if you are shipping audio and you do something like this pseudo code:

send(data);
send(data);
finalize();

You will almost assuredly get no transcription back, this is because send is an async function, so the finalize() would execute before the data has shipped to deepgram.

see https://github.com/deepgram/deepgram-js-sdk/blob/b39256d9bd854f7ffebe99d1a3ceec9a43a65ad2/src/packages/AbstractLiveClient.ts#L235-L263

send() is a hidden async function because callback() is async, but callback is not returned, so the promise isn't awaitable

Steps to reproduce

send(data);
send(data);
finalize();

Expected behavior

either send is awaitable by simply returning the callback or it's not an async function.

Please tell us about your environment

deepgram sdk v3.11.1

  • Operating System/Version: Macos
  • Language: typescript in Bun

tobowers avatar Feb 20 '25 09:02 tobowers

I'm experience the same issue trying to finalize audio, I expect that calling .finalize() after .send()-ing audio will result in a full transcript of the sent audio, but it often misses content toward the end of the audio and incorrectly delivers this in later transcripts. Accurate finalization is important for building push-to-talk experiences.

I noticed that adding some timeout (~400 ms) between sending and finalizing allowed the full transcript to appear. Having a reliable way to do this without introducing unnecessary delays would be helpful.

mattrossman avatar Mar 27 '25 17:03 mattrossman

I'm experience the same issue trying to finalize audio, I expect that calling .finalize() after .send()-ing audio will result in a full transcript of the sent audio, but it often misses content toward the end of the audio and incorrectly delivers this in later transcripts. Accurate finalization is important for building push-to-talk experiences.

I noticed that adding some timeout (~400 ms) between sending and finalizing allowed the full transcript to appear. Having a reliable way to do this without introducing unnecessary delays would be helpful.

As a more convoluted workaround :)... you're going to get finalized transcripts back from deepgram with the length of the audio, so you can say "is this close to the amount of audio I have shipped deepgram?" and if the answer is yes, you can call finalize. It's annoying but it does result in fast transcription.

tobowers avatar Mar 27 '25 18:03 tobowers

I suspect the issue isn't solvable with an SDK patch, because even if I go into node_modules and make it async and await callback(), I still face the issue. Besides, the only async work in that callback is the Blob: arrayBuffer() method, which doesn't run for me since I'm sending a buffer. WebSocket guarantees that messages are delivered in order, so presumably by the time they receive a "finalize" message they've already received preceding audio, I imagine this is a bug with Deepgram's service not properly waiting for inflight audio to finish processing before sending the finalized transcript? Or if clients are expected to wait for inflight audio to be received/processed before requesting finalization, then their service would have to send some kind of event for that.

mattrossman avatar Mar 27 '25 20:03 mattrossman

Yes, this isn't solvable in the SDK. Even if we queued the messages, there is still a race condition in the transmit. The only recourse is to experiment with an arbitrary wait. I will feed this back to the API team, in the hopes they have a better suggestion than that. I'll leave this open in the meantime

lukeocodes avatar Jun 20 '25 14:06 lukeocodes