vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

problem with nodejs microphone recognition

Open Asma-droid opened this issue 2 years ago • 26 comments

Hello,

I'am using the nodejs client, i found that the transcription result is so bad? Did you have the same problem ?Are there some micro configuration to do ? or client python is the well recommended tool ?

Asma-droid avatar Feb 03 '22 12:02 Asma-droid

Did you have the same problem ?

No

Are there some micro configuration to do ?

Hard to tell, you need to dump the microphone audio to a file for analysis and share the file

or client python is the well recommended tool ?

They must have identical outputs.

nshmyrev avatar Feb 03 '22 13:02 nshmyrev

For my example, i have using the attached code for transcription and results are as follows:

image

Results with ffmpeg gives good results test_code.txt

Asma-droid avatar Feb 03 '22 13:02 Asma-droid

You need to store data to a file to verify the actual contents

nshmyrev avatar Feb 03 '22 21:02 nshmyrev

I am also facing this issue. Everything works fine on Python and JAVA, but on NodeJS it gives pretty much random results. I have a suspicion that it uses a wrong device, but I have no idea how to test it, or how should I export the recorded audio from the mic.

Any help would be appreciated.

Moenish avatar Jul 19 '22 11:07 Moenish

Update: I verified the recorded audio, and there isn't any problem with it. So I don't know why does vosk produce such bad results in JS.

Moenish avatar Jul 19 '22 12:07 Moenish

It always seems to produce 'the', even though I was completely silent.

Moenish avatar Jul 19 '22 12:07 Moenish

Share the audio. Most likely the format is wrong.

nshmyrev avatar Jul 19 '22 12:07 nshmyrev

output.zip

play.exe -b 16 -e signed -c 1 -r 16000 .\output.raw

This is what I used, only for testing tho.

Moenish avatar Jul 19 '22 12:07 Moenish

The result:

LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from D:\Kriszti\Desktop\vosk-api\nodejs\demo\model/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:282) Loading HCL and G from D:\Kriszti\Desktop\vosk-api\nodejs\demo\model/graph/HCLr.fst D:\Kriszti\Desktop\vosk-api\nodejs\demo\model/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():model.cc:303) Loading winfo D:\Kriszti\Desktop\vosk-api\nodejs\demo\model/graph/phones/word_boundary.int
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }
{ partial: '' }

Stopping
Microhphone stopped
Cleaning up
{ text: 'the' }
recording audioProcess has exited with code = 4294967295

Moenish avatar Jul 19 '22 12:07 Moenish

Log Level is set to 1, debug is enabled, everything else is untouched.

Moenish avatar Jul 19 '22 12:07 Moenish

File output.raw has 32-bit audio, you need to find the way to force microphone to record 16-bit audio. It might be a bug in node mic module

nshmyrev avatar Jul 19 '22 12:07 nshmyrev

I see. Any idea on how to force it from code?

Moenish avatar Jul 19 '22 12:07 Moenish

The weird thing is, the mic is set to 16 bit audio.

Moenish avatar Jul 19 '22 13:07 Moenish

If you are on Windows check driver properties, maybe there is an option somewhere to enable other formats.

nshmyrev avatar Jul 19 '22 13:07 nshmyrev

I have 2 options:

  • 2 channels, 16 bit, 44100 Hz
  • 2 channels, 16 bit, 48000 Hz

Moenish avatar Jul 19 '22 13:07 Moenish

Neither of them works.

Moenish avatar Jul 19 '22 13:07 Moenish

Ok. You can also try to run sox from the command line according to mic sources and see what happens

https://github.com/ashishbajaj99/mic/blob/master/lib/mic.js#L48

nshmyrev avatar Jul 19 '22 13:07 nshmyrev

No problem there, Sox records and plays back the audio correctly.

Moenish avatar Jul 19 '22 13:07 Moenish

I tried out other packages for audio streaming from microphone, and the problem still persits :/

Moenish avatar Jul 20 '22 09:07 Moenish

No problem there, Sox records and plays back the audio correctly.

Try to record a file sox with the same command line as in the code and share the result.

   sox.exe -b 16 --endian little -c 1 -r 16000 -e S16_LE -t waveaudio default -p > file.raw

Even wrong format will play fine, it is not a good test to playback. You need to test the format of the recording.

nshmyrev avatar Jul 20 '22 12:07 nshmyrev

file.zip Here's the file, wasn't able to test it properly tho.

Moenish avatar Jul 21 '22 12:07 Moenish

I noticed that when I speak, the microphone isn't used continuously, it turns on and off, according to the windows tray icon. Maybe that is the reason why?

I am using Windows 11 if that matters.

Moenish avatar Jul 22 '22 05:07 Moenish

I'm also having this problem. The weird thing is when I record using sox, then try it in the test_simple it seems to work. It makes me wonder if there is an issue with the implementation of sox into the mic plugin. I parsed the mic plugin as well, and it's clearly passing the correct settings to boot, so this is quite vexxing.

jstopchick avatar Aug 16 '22 22:08 jstopchick

Maybe we can simply move to https://www.npmjs.com/package/node-portaudio for the demo, no need to keep this broken sox around.

nshmyrev avatar Aug 16 '22 22:08 nshmyrev

Hey there, I slapped this demo together using the library at the following location: https://github.com/Streampunk/naudiodon

It looks like using this fixes the issue as I've confirmed this recognizes my inputs and it works on Windows 10 64 bit. Thanks for providing such a great API. test_naudio.zip

jstopchick avatar Aug 17 '22 02:08 jstopchick

Can confirm, it works now with node-portaudio, although the recognition is a bit choppy. Only tested with the lgraph english model, will test with the larger model.

Moenish avatar Aug 17 '22 06:08 Moenish