pocketsphinx.js icon indicating copy to clipboard operation
pocketsphinx.js copied to clipboard

Poor accuracy on low-CPU computers (chromebooks, netbooks, etc..)

Open gpowerone opened this issue 8 years ago • 6 comments

We've been using pocketsphinx.js successfully for some time.

We've recently discovered (in Google chrome) that the accuracy is very bad on computers with comparatively less CPU, while it works perfectly fine on computers with higher CPU.

Anybody else have this problem and/or a recommendation about how to reduce the CPU usage of pocketsphinx.js? We are using a larger acoustic model than the default rm model, but not as big as the CMU sphinx model.

Here is my current configuration:

[NAME] [DEFLT] [VALUE] -agc none none -agcthresh 2.0 2.000000e+00 -allphone -allphone_ci no no -alpha 0.97 9.700000e-01 -ascale 20.0 2.000000e+01 -aw 1 1 -backtrace no no -beam 1e-48 1.000000e-48 -bestpath yes yes -bestpathlw 9.5 9.500000e+00 -ceplen 13 13 -cmn live current -cmninit 40,3,-1 40,3,-1 -compallsen no no -debug 0 -dict SRdictionary.dict -dictcase no no -dither no no -doublebw no no -ds 1 1 -fdict -feat 1s_c_d_dd 1s_c_d_dd -featparams -fillprob 1e-8 1.000000e-08 -frate 100 100 -fsg -fsgusealtpron yes yes -fsgusefiller yes yes -fwdflat yes yes -fwdflatbeam 1e-64 1.000000e-64 -fwdflatefwid 4 4 -fwdflatlw 8.5 8.500000e+00 -fwdflatsfwin 25 25 -fwdflatwbeam 7e-29 7.000000e-29 -fwdtree yes yes -hmm en-us -input_endian little little -jsgf -keyphrase -kws -kws_delay 10 10 -kws_plp 1e-1 1.000000e-01 -kws_threshold 1 1.000000e+00 -latsize 5000 5000 -lda -ldadim 0 0 -lifter 0 22 -lm SRlm.lm -lmctl -lmname -logbase 1.0001 1.000100e+00 -logfn -logspec no no -lowerf 133.33334 1.300000e+02 -lpbeam 1e-40 1.000000e-40 -lponlybeam 7e-29 7.000000e-29 -lw 6.5 6.500000e+00 -maxhmmpf 30000 30000 -maxwpf -1 -1 -mdef -mean -mfclogdir -min_endfr 0 0 -mixw -mixwfloor 0.0000001 1.000000e-07 -mllr -mmap yes yes -ncep 13 13 -nfft 512 512 -nfilt 40 25 -nwpen 1.0 1.000000e+00 -pbeam 1e-48 1.000000e-48 -pip 1.0 1.000000e+00 -pl_beam 1e-10 1.000000e-10 -pl_pbeam 1e-10 1.000000e-10 -pl_pip 1.0 1.000000e+00 -pl_weight 3.0 3.000000e+00 -pl_window 5 5 -rawlogdir -remove_dc no no -remove_noise yes no -remove_silence yes no -round_filters yes yes -samprate 16000 1.600000e+04 -seed -1 -1 -sendump -senlogdir -senmgau -silprob 0.005 5.000000e-03 -smoothspec no no -svspec 0-12/13-25/26-38 -tmat -tmatfloor 0.0001 1.000000e-04 -topn 4 4 -topn_beam 0 0 -toprule -transform legacy dct -unit_area yes yes -upperf 6855.4976 6.800000e+03 -uw 1.0 1.000000e+00 -vad_postspeech 50 50 -vad_prespeech 20 20 -vad_startspeech 10 10 -vad_threshold 2.0 2.000000e+00 -var -varfloor 0.0001 1.000000e-04 -varnorm no no -verbose no no -warp_params -warp_type inverse_linear inverse_linear -wbeam 7e-29 7.000000e-29 -wip 0.65 6.500000e-01 -wlen 0.025625 2.562500e-02

Thanks

gpowerone avatar Jan 27 '17 15:01 gpowerone

Depends on model which you use, the grammar which you use and so on. Pocketsphinx optimization is described in the wiki:

http://cmusphinx.sourceforge.net/wiki/pocketsphinxhandhelds

nshmyrev avatar Jan 27 '17 21:01 nshmyrev

I should have mentioned that Pocketsphinx by itself works very well (high accuracy/recognition) when I use it in just a plain C++ program with the same model/grammar/dictionary on my local computer. Accuracy goes down significantly when using the Pocketsphinx.js module in the browser. I understand that this is the exact same C++ code, so that leaves me to believe that it's the browser.

Is it possible that the browser is somehow limiting the amount of CPU that pocketsphinx.js or the audioRecorder module (I am using the default audioRecorder that comes with pocketsphinx.js) can use? This in turn would cause poorer recognition due to delays in filling/processing the buffers associated with speech. I am no expert in browsers, so this could be very wrong, but looking for feedback as to whether or not this is actually a consideration.

gpowerone avatar Jan 30 '17 17:01 gpowerone

@gpowerone sorry that I did not reply your first message, I had a similar reasoning: the only thing that could make accuracy depend on available CPU is related to the actual recording of audio. It might just be that a queue becomes full and some buffers get dropped.

I guess the good news is that it should be the easiest to debug.

syl22-00 avatar Jan 30 '17 17:01 syl22-00

I am having the exact same issue

AlecHaring avatar Mar 01 '17 04:03 AlecHaring

Will there be scope for porting sphinx to Webassembly once its available? Would there be a significant improvement in performance, accuracy and code maintainability?

I ask because our current case utilises Google Speech Api which is expensive for us developers, even in production it is buggy sometimes returning nothing.

mscreenie avatar Mar 19 '17 14:03 mscreenie

@mscreenie I'd love to see how webasssembly would work out.

However, I am not sure pocketsphinx.js could be viewed as an alternative to the Google Speech API. As new generation recognizers hosted on servers keep improving, pocketsphinx will lag behind in terms of performance. I'd say pocketsphinx.js is relevant in case where grammar-based language models can be used, or for keyword spotting. Or when hosted solutions are not an option.

syl22-00 avatar Mar 31 '17 07:03 syl22-00