System sound + mic real time transcription
Would be really interesting if you can add system sound + mic real time transcription. This library is recording system sound: https://github.com/s0d3s/PyAudioWPatch
Isn't that a job for whichever client is opening the websocket?
On mac OS this can be done at system level using BlackHole. I will check on windows, if I don't find a similar loopback solution, I will add an option to use PyAudioWPatch
is it going to be implemented in next release?
Yes. Haven't looked at it yet, but if that works, I'll try to do a release end of week
What we investigated is that [PyAudioWPatch] captures audio from the host system. However, if the application is running on a server and accessed remotely via the web from a different machine, this solution does not work, as it only captures sound from the host system, not the client's device. Any solution for that scenario?
There is no other choice than launching a python script on the client side. It would means, on client side :
- a little script with PyAudioWPatch + websocket (pip install websockets) that sends to the backend the audio
- an almost identical html frontend that would receive and display the result. The microphone part of the .html would have to be removed
@rrodriguezlo What is your use case? How are you implementing the client? System audio must be sent from the client itself. The server has no way to control this.
If you are using Javascript, in terms of web standards, there is some limited browser support (i.e. you will need to ask client to use a specific browser): https://caniuse.com/mdn-api_mediadevices_getdisplaymedia_audio_capture_support
Otherwise, you need to have the client install some virtual loopback driver like @QuentinFuxa mentioned. BUT if that's the requirement, then for user experience, you might be better to develop some native desktop/mobile app for your use case.
For everyone's information, some fortunate Windows machines have a hidden recording device called "Stereo Mix", which is usually disabled by default. It allows you to capture all system audio. I'm using it to transcribe audio with WhisperLiveKit from a live web TV stream.
There's Virtual Audio Cable (VAC), for Windows, that can help. It can redirect/mix audio sources in a new virtual one.
It would be great if the client were built using Flutter or Tauri