WhisperLiveKit icon indicating copy to clipboard operation
WhisperLiveKit copied to clipboard

bug: audio detection and file upload limitations

Open venturero opened this issue 4 months ago • 2 comments

WhisperLiveKit is experiencing two key limitations:

  1. Computer Audio Detection Failure

    • When attempting to record audio playing from the computer, the application fails to detect system audio
    • Voice recognition works correctly when speaking directly into the microphone
  2. File Upload Restriction

    • Unable to upload voice recordings to localhost

Steps to Reproduce

  1. Play audio from computer
  2. Attempt to record using WhisperLiveKit
  3. Observe lack of audio detection from system sound

Expected Behavior

  • Detect and transcribe audio from computer sources
  • Allow uploading of voice recordings to localhost

Actual Behavior

  • System audio not recognized during recording
  • File upload functionality unavailable

Additional Context

  • Microphone direct input works correctly
  • Requires investigation into system audio capture mechanisms
  • Needs implementation of file upload feature

Potential Solutions

  • Investigate system audio capture libraries
  • Implement file upload endpoint for localhost

venturero avatar Aug 31 '25 15:08 venturero

Hello Venturero, To be able to record system audio you need some kind of loopback device to present the output audio as an input device. This is a general OS/conventional issue, not specifically related to WLK.

Some audio interfaces provides this natively, most professional audio interfaces do, and some consumer devices also under different names like "Stereo Mix" or "What U Hear", on Windows at least.

If you are running Windows or Mac and lack such an option, then look into the excellent VB-Audio Cable, it creates a virtual sound device for this task and I'm using it successfully with WLK. With this said, it would be nice to see a direct stream sink for in WLK other than websockets. Since FFMPEG is already used internally to convert incoming audio, it should be a relatively small addition to start a custom listening instance on a different port with a command switch. If I'd have the spare time, I'd look into it myself but I don't have that luxury at the moment.

As for your second limitation regarding file upload, the name of the project is quite telling, Whisper LIVE Kit. I would say it makes little sense at this stage to add offline functionality to this project when there are tons of implementations of Whisper that does what you ask for, not to mention the original project from OpenAI. IMO the maintainer of and contributors to this project should definitely focus on improving the live-aspect, which is its unique strength.

BR Alexander

Alexander-ARTV avatar Sep 08 '25 12:09 Alexander-ARTV

有一个方法 ,可以让浏览器共享电脑内容 包括音频

Image

XjiangSail avatar Oct 10 '25 12:10 XjiangSail