WhisperLiveKit System sound + mic real time transcription

Would be really interesting if you can add system sound + mic real time transcription. This library is recording system sound: https://github.com/s0d3s/PyAudioWPatch

Mar 25 '25 07:03 rrodriguezlo

Isn't that a job for whichever client is opening the websocket?

Apr 09 '25 02:04 needabetterusername

On mac OS this can be done at system level using BlackHole. I will check on windows, if I don't find a similar loopback solution, I will add an option to use PyAudioWPatch

Apr 09 '25 07:04 QuentinFuxa

is it going to be implemented in next release?

Apr 22 '25 07:04 rrodriguezlo

Yes. Haven't looked at it yet, but if that works, I'll try to do a release end of week

Apr 22 '25 09:04 QuentinFuxa

What we investigated is that [PyAudioWPatch] captures audio from the host system. However, if the application is running on a server and accessed remotely via the web from a different machine, this solution does not work, as it only captures sound from the host system, not the client's device. Any solution for that scenario?

Apr 22 '25 09:04 rrodriguezlo

There is no other choice than launching a python script on the client side. It would means, on client side :

a little script with PyAudioWPatch + websocket (pip install websockets) that sends to the backend the audio
an almost identical html frontend that would receive and display the result. The microphone part of the .html would have to be removed

Apr 22 '25 16:04 QuentinFuxa

@rrodriguezlo What is your use case? How are you implementing the client? System audio must be sent from the client itself. The server has no way to control this.

If you are using Javascript, in terms of web standards, there is some limited browser support (i.e. you will need to ask client to use a specific browser): https://caniuse.com/mdn-api_mediadevices_getdisplaymedia_audio_capture_support

Otherwise, you need to have the client install some virtual loopback driver like @QuentinFuxa mentioned. BUT if that's the requirement, then for user experience, you might be better to develop some native desktop/mobile app for your use case.

Apr 23 '25 09:04 needabetterusername

For everyone's information, some fortunate Windows machines have a hidden recording device called "Stereo Mix", which is usually disabled by default. It allows you to capture all system audio. I'm using it to transcribe audio with WhisperLiveKit from a live web TV stream.

May 18 '25 17:05 Royalphax

There's Virtual Audio Cable (VAC), for Windows, that can help. It can redirect/mix audio sources in a new virtual one.

Jun 30 '25 13:06 andrealorenzon

It would be great if the client were built using Flutter or Tauri

Oct 15 '25 08:10 4444TENSEI