WhisperLiveKit icon indicating copy to clipboard operation
WhisperLiveKit copied to clipboard

System sound + mic real time transcription

Open rrodriguezlo opened this issue 8 months ago • 10 comments

Would be really interesting if you can add system sound + mic real time transcription. This library is recording system sound: https://github.com/s0d3s/PyAudioWPatch

rrodriguezlo avatar Mar 25 '25 07:03 rrodriguezlo

Isn't that a job for whichever client is opening the websocket?

needabetterusername avatar Apr 09 '25 02:04 needabetterusername

On mac OS this can be done at system level using BlackHole. I will check on windows, if I don't find a similar loopback solution, I will add an option to use PyAudioWPatch

QuentinFuxa avatar Apr 09 '25 07:04 QuentinFuxa

is it going to be implemented in next release?

rrodriguezlo avatar Apr 22 '25 07:04 rrodriguezlo

Yes. Haven't looked at it yet, but if that works, I'll try to do a release end of week

QuentinFuxa avatar Apr 22 '25 09:04 QuentinFuxa

What we investigated is that [PyAudioWPatch] captures audio from the host system. However, if the application is running on a server and accessed remotely via the web from a different machine, this solution does not work, as it only captures sound from the host system, not the client's device. Any solution for that scenario?

rrodriguezlo avatar Apr 22 '25 09:04 rrodriguezlo

There is no other choice than launching a python script on the client side. It would means, on client side :

  • a little script with PyAudioWPatch + websocket (pip install websockets) that sends to the backend the audio
  • an almost identical html frontend that would receive and display the result. The microphone part of the .html would have to be removed

QuentinFuxa avatar Apr 22 '25 16:04 QuentinFuxa

@rrodriguezlo What is your use case? How are you implementing the client? System audio must be sent from the client itself. The server has no way to control this.

If you are using Javascript, in terms of web standards, there is some limited browser support (i.e. you will need to ask client to use a specific browser): https://caniuse.com/mdn-api_mediadevices_getdisplaymedia_audio_capture_support

Otherwise, you need to have the client install some virtual loopback driver like @QuentinFuxa mentioned. BUT if that's the requirement, then for user experience, you might be better to develop some native desktop/mobile app for your use case.

needabetterusername avatar Apr 23 '25 09:04 needabetterusername

For everyone's information, some fortunate Windows machines have a hidden recording device called "Stereo Mix", which is usually disabled by default. It allows you to capture all system audio. I'm using it to transcribe audio with WhisperLiveKit from a live web TV stream.

Image

Royalphax avatar May 18 '25 17:05 Royalphax

There's Virtual Audio Cable (VAC), for Windows, that can help. It can redirect/mix audio sources in a new virtual one.

andrealorenzon avatar Jun 30 '25 13:06 andrealorenzon

It would be great if the client were built using Flutter or Tauri

4444TENSEI avatar Oct 15 '25 08:10 4444TENSEI