vosk-api
vosk-api copied to clipboard
problem with nodejs microphone recognition
Hello,
I'am using the nodejs client, i found that the transcription result is so bad? Did you have the same problem ?Are there some micro configuration to do ? or client python is the well recommended tool ?
Did you have the same problem ?
No
Are there some micro configuration to do ?
Hard to tell, you need to dump the microphone audio to a file for analysis and share the file
or client python is the well recommended tool ?
They must have identical outputs.
For my example, i have using the attached code for transcription and results are as follows:
Results with ffmpeg gives good results test_code.txt
You need to store data to a file to verify the actual contents
I am also facing this issue. Everything works fine on Python and JAVA, but on NodeJS it gives pretty much random results. I have a suspicion that it uses a wrong device, but I have no idea how to test it, or how should I export the recorded audio from the mic.
Any help would be appreciated.
Update: I verified the recorded audio, and there isn't any problem with it. So I don't know why does vosk produce such bad results in JS.
It always seems to produce 'the', even though I was completely silent.
Share the audio. Most likely the format is wrong.
play.exe -b 16 -e signed -c 1 -r 16000 .\output.raw
This is what I used, only for testing tho.
The result:
LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from D:\Kriszti\Desktop\vosk-api\nodejs\demo\model/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:282) Loading HCL and G from D:\Kriszti\Desktop\vosk-api\nodejs\demo\model/graph/HCLr.fst D:\Kriszti\Desktop\vosk-api\nodejs\demo\model/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():model.cc:303) Loading winfo D:\Kriszti\Desktop\vosk-api\nodejs\demo\model/graph/phones/word_boundary.int
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
Stopping
Microhphone stopped
Cleaning up
{ text: 'the' }
recording audioProcess has exited with code = 4294967295
Log Level is set to 1, debug is enabled, everything else is untouched.
File output.raw has 32-bit audio, you need to find the way to force microphone to record 16-bit audio. It might be a bug in node mic module
I see. Any idea on how to force it from code?
The weird thing is, the mic is set to 16 bit audio.
If you are on Windows check driver properties, maybe there is an option somewhere to enable other formats.
I have 2 options:
- 2 channels, 16 bit, 44100 Hz
- 2 channels, 16 bit, 48000 Hz
Neither of them works.
Ok. You can also try to run sox from the command line according to mic sources and see what happens
https://github.com/ashishbajaj99/mic/blob/master/lib/mic.js#L48
No problem there, Sox records and plays back the audio correctly.
I tried out other packages for audio streaming from microphone, and the problem still persits :/
No problem there, Sox records and plays back the audio correctly.
Try to record a file sox with the same command line as in the code and share the result.
sox.exe -b 16 --endian little -c 1 -r 16000 -e S16_LE -t waveaudio default -p > file.raw
Even wrong format will play fine, it is not a good test to playback. You need to test the format of the recording.
file.zip Here's the file, wasn't able to test it properly tho.
I noticed that when I speak, the microphone isn't used continuously, it turns on and off, according to the windows tray icon. Maybe that is the reason why?
I am using Windows 11 if that matters.
I'm also having this problem. The weird thing is when I record using sox, then try it in the test_simple it seems to work. It makes me wonder if there is an issue with the implementation of sox into the mic plugin. I parsed the mic plugin as well, and it's clearly passing the correct settings to boot, so this is quite vexxing.
Maybe we can simply move to https://www.npmjs.com/package/node-portaudio for the demo, no need to keep this broken sox around.
Hey there, I slapped this demo together using the library at the following location: https://github.com/Streampunk/naudiodon
It looks like using this fixes the issue as I've confirmed this recognizes my inputs and it works on Windows 10 64 bit. Thanks for providing such a great API. test_naudio.zip
Can confirm, it works now with node-portaudio, although the recognition is a bit choppy. Only tested with the lgraph english model, will test with the larger model.