Plugin.Maui.Audio icon indicating copy to clipboard operation
Plugin.Maui.Audio copied to clipboard

Sending Linear PCM audio chunks to a cloud service that has whisper deployment for transcription in an efficient way near real time

Open radrad opened this issue 1 year ago • 1 comments
trafficstars

If we use Linear PCM (LPCM), which doesn't include headers and allows any segment of the audio to be played independently, we can send the audio chunks to the backend. With a method for silence detection, we could split the real-time audio into chunks and send these to OpenAI's Whisper for transcription. This approach could enable near real-time transcription display on the frontend, store the transcription, and perform summarization or other AI tasks.

I'm interested in finding out how to send the minimum amount of bytes with sufficient quality to a cloud-based Whisper deployment for transcription. This transcription could be saved as metadata for the audio recording, and the audio itself backed up to a cloud location. Additionally, generating .srt files with timestamps would allow users to jump to specific audio segments corresponding to the subtitles.

radrad avatar May 29 '24 04:05 radrad

Hey there, sorry we didn't get to this earlier. I'm... Not entirely sure what you're asking here? :)

jfversluis avatar Aug 07 '24 14:08 jfversluis

No response and streaming is now supported (#160) , so I hope that helps you

jfversluis avatar Apr 04 '25 16:04 jfversluis