whisper-ctranslate2 icon indicating copy to clipboard operation
whisper-ctranslate2 copied to clipboard

Mic input for live transcription

Open joshoreefe opened this issue 1 year ago • 6 comments

After some upgrades and configuration changes the live transcription stopped working. My setup was working okay, but for unknown reason stopped capturing the mic input. Hence upgraded Jetson Orin Nano developer kit 4b to JetPack 5.1.3.

The live input device doesn't seem to capture audio same way as arecord. If I do a test recording so:

arecord -D usbmic test.wav Recording WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 8000 Hz, Mono

the recorded audio is fine. The audio file transcribes correctly.

If I then try live transcription using the same device so:

whisper-ctranslate2 --live_transcribe True --live_input_device 27 ....etc

the process starts okay: Live stream device: usbmic Listening.. (Ctrl+C to Quit)

But that's all. Nothing happens. Seems the capture is working differently from record?

joshoreefe avatar Apr 03 '24 07:04 joshoreefe

Is live transcribing working for others? If so, please give some setup hints!

joshoreefe avatar Apr 09 '24 08:04 joshoreefe

Can confirm on Mac as well. Feeding in mp3 works but live stream doesn't, even though the device is detected.

Benjamin-Lee avatar May 04 '24 01:05 Benjamin-Lee

Live transcribing isn't working for me either

965311532 avatar May 24 '24 07:05 965311532

The built-in microphone of the M-MacBooks is known to have problems with the input volume with various programs. Sometimes a sudo killall coreaudiod helps for a while, but not here. In fact, the threshold can be lowered, then it works for me:

whisper-ctranslate2 --live_transcribe True --live_volume_threshold 0.01

pheraph avatar Sep 02 '24 14:09 pheraph

This worked for me also on a Linux box. I was getting no output. Named my mic device and it reported it properly, so, that wasn't it. the threshold (for me) at 0.01 picked up crickets farting. I bumped it up to 0.025 and it is better. This is the first faster-whisper setup I have used that actually works. I am getting no outrageous hallucinations with this version. I have used VAD on others, too. I inserted rms threshold and all kinds of checks but still got misread results and crazy nonsense with no input from mic at all.

MarsThunder avatar Feb 14 '25 20:02 MarsThunder