full-duplex streaming for realtime audio transcribe

Open aviadr1 opened this issue 1 year ago • 1 comments

I'm looking to perform online realtime transcription (i.e. with something like whisper-streaming) for this we need FULL DUPLEX capabilities i.e. the client needs to be able to continually stream data to the server.

I see the https://github.com/replicate/replicate-python?tab=readme-ov-file#run-a-model-and-stream-its-output example which shows the server can stream results, but the input has to be sent initially and I dont see how the client could send more and more input data.

is full duplex streaming supported in replicate or can you add support for it?

Sep 08 '24 14:09 aviadr1

Hi @aviadr1. There's nothing about Replicate's platform or client libraries that preclude full duplex streaming. I'm not aware of any public models doing this currently, but you could accomplish this by building a whisper model with Cog that takes a URL input to a stream of audio and outputs cog.ConcatenateIteraor[str] (yield transcript chunks).

Sep 09 '24 16:09 mattt