pocketsphinx.js icon indicating copy to clipboard operation
pocketsphinx.js copied to clipboard

Example with official Language Models as provided by Pocketsphinx

Open willemmulder opened this issue 10 years ago • 36 comments

Hi,

I'm just getting 'deeper' into speech-recognition and would really benefit from an example using the 'official language models' files as they are provided for pocketsphinx here: http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/

Right now the examples use simple arrays with lists of words, but it would really help me to see a 'real' example where a real language model + dictionary is used.

Would that be possible?

willemmulder avatar Oct 10 '14 12:10 willemmulder

Hi, From the sample app that uses grammars (webapp/live.html), you can adjust it to use a statistical language model. Note that grammars can be much more elaborate than what we have in this sample app though. Anyways, the basic steps to use a statistical language models are:

That should be it.

syl22-00 avatar Oct 10 '14 19:10 syl22-00

Thanks for the guide.

Would you mind if I tried setting up an example and make a pull-request when it works?

willemmulder avatar Oct 10 '14 20:10 willemmulder

That'd be awesome. I can also help getting it working if you find any issue.

syl22-00 avatar Oct 10 '14 21:10 syl22-00

Thanks a lot to you guys!

derhuerst avatar Oct 10 '14 21:10 derhuerst

So let's document my first steps ;-)

First, I installed Emscripten and read through its docs a little.

Then, I downloaded the English Generic language model here: http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/

It does not have a dictionary... So I used this one from here (although I don't have the slightest clue whether you can just mix and match language models and dictionaries): http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20HUB4%20Language%20Model/

Then I called the file_packager.py (which I assume just packs a binary file in JS format and wraps it in a module definition so other scripts can load the script and get hold of its data...?) like this:

python "C:\Program Files\Emscripten\emscripten\1.25.0\tools\file_packager.py" somefile.js --embed cmudict.hub4.06d.dict --js-output=dict.js

and

python "C:\Program Files\Emscripten\emscripten\1.25.0\tools\file_packager.py" somefile.js --embed en-us.lm --js-output=lm.js

where somefile.js doesn't actually exist, but I didn't understand why that parameter was necessary anyways. What I get is two rather large files:

  1. dict.js (19 Mb)
  2. lm.js (540 Mb)

I then spun up a Nodejs server, served the webapp folder, and modified the live.html file such that the initRecognizer function nows looks like:

var initRecognizer = function() {
      // You can pass parameters to the recognizer, such as : {command: 'initialize', data: [["-hmm", "my_model"], ["-fwdflat", "no"]]}
      postRecognizerJob({command: 'initialize'},
        function() {
          postRecognizerJob({command: 'load',
            data: ["../../model-en-us/dict.js", "../../model-en-us/lm.js"]
           }, function() { 
            // Done loading!
            recognizerReady();
           });
        }
      );
  };

If I check the console, I see that it is loading the dict.js and lm.js files, they have a 200OK code. Yet, their file is smaller than their actual size, and the worker comes back with a "NETWORK_ERROR".

Can they not handle these large files?

Does that mean that I really have to embed the lm.js and dict.js files in the pocketsphinx.js file directly (and hope that that file does get loaded correctly)?

Thanks for any help!

willemmulder avatar Oct 13 '14 20:10 willemmulder

Hi @willemmulder ,

I observed NETWORK_ERROR being returned when JavaScript files loaded by importScripts are not found. Are you sure you have the correct relative path? I think paths are relative to the HTML file that loads the scripts. Also, if you have relative paths that goes up in the folder tree, make sure you never get beyond your web server root.

I never tried that large files. I am not sure whether there is a limit by the browser, but you would probably need to increase TOTAL_MEMORY=100663296 in CMakeLists.txt.

I would suggest you to start with a smaller language model, and I do not think it will make any difference to embed it in pocketsphinx.js, it would just concatenate the files into one.

Your argument somefile.js should be the pocketsphinx.js file that will eventually have to load these files. I am not sure if it matter though.

Good luck, and let us know how it goes.

syl22-00 avatar Oct 13 '14 20:10 syl22-00

For the generic language you need to use

http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp/download

It's size is 25Mb. It should not grow after conversion to Javascript, that's actually my main concern about pocketsphinx.js, somehow it increases the size of the data files.

nshmyrev avatar Oct 13 '14 20:10 nshmyrev

Ok, so a DMP file is the binary version of a normal .lm file, right? So it's more compact. Good.

( Can I use any .dict file? Since the http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/ folder does not provide a dictionary. Is that a problem? How does the combination of language model (lm) and dictionary (dict) work? )

The Javascript generated file is now only 99mb, and it loads successfully!

"Loading page Initializing web audio and speech recognizer, waiting for approval to access the microphone Audio recorder ready Recognizer ready"

Excellent.

However, when I press the start button, I get

Error in switchgrammar with code [object Object] Error in process with code [object Object] Error in process with code [object Object] ... more of those

Any clues?

willemmulder avatar Oct 15 '14 18:10 willemmulder

Just a note on what I tried: the recognizer wants to do

recognizer.switchSearch(parseInt(id));

where I assume that id is the identifier at the bottom of the generated Javascript files:

Module['FS_createDataFile']('/', 'cmusphinx-5.0-en-us.lm.dmp', fileData0, true, true);

So I use cmusphinx-5.0-en-us.lm.dmp as id when I start the recognizer:

var startRecording = function() {
    var id = "cmusphinx-5.0-en-us.lm.dmp";
    if (recorder && recorder.start(id)) displayRecording(true);
  };

But that fails with the above error message...

willemmulder avatar Oct 15 '14 18:10 willemmulder

It seems like it is trying to switch grammar, which should not happen. Probably you are providing and id when calling start, which you should not (it should not attempt to switch to a specific grammar, but rather use the loaded SLM)

syl22-00 avatar Oct 15 '14 18:10 syl22-00

So, as I was just writing, just call recorder.start(), id is for switching between grammars, you are not using one.

syl22-00 avatar Oct 15 '14 18:10 syl22-00

(So providing an id means telling the recognizer to load a specific grammarSet?)

Even so, I tried without an id, and it jumps into the next error:

Error in start with code [object Object] Error in process with code [object Object]

willemmulder avatar Oct 15 '14 18:10 willemmulder

Can you post the logs that you get in the JavaScript console? You should have a bunch of output from pocketsphinx. Also it would be useful if you could find what's that object returned as error code?

console.log(e.data.code);

syl22-00 avatar Oct 15 '14 19:10 syl22-00

It seems to be an empty object... Here's the full log:

INFO: cmd_ln.c(696): Parsing command line: js/pocketsphinx.js:1
\ js/pocketsphinx.js:1
    -bestpath no \ js/pocketsphinx.js:1
    -hmm rm1_200 \ js/pocketsphinx.js:1
    -remove_noise no  js/pocketsphinx.js:1
 js/pocketsphinx.js:1
Current configuration: js/pocketsphinx.js:1
[NAME]      [DEFLT]     [VALUE] js/pocketsphinx.js:1
-adcdev              js/pocketsphinx.js:1
-agc        none        none js/pocketsphinx.js:1
-agcthresh  2.0     2.000000e+00 js/pocketsphinx.js:1
-allphone            js/pocketsphinx.js:1
-allphone_ci    no      no js/pocketsphinx.js:1
-alpha      0.97        9.700000e-01 js/pocketsphinx.js:1
-argfile             js/pocketsphinx.js:1
-ascale     20.0        2.000000e+01 js/pocketsphinx.js:1
-aw     1       1 js/pocketsphinx.js:1
-backtrace  no      no js/pocketsphinx.js:1
-beam       1e-48       1.000000e-48 js/pocketsphinx.js:1
-bestpath   yes     no js/pocketsphinx.js:1
-bestpathlw 9.5     9.500000e+00 js/pocketsphinx.js:1
-bghist     no      no js/pocketsphinx.js:1
-ceplen     13      13 js/pocketsphinx.js:1
-cmn        current     current js/pocketsphinx.js:1
-cmninit    8.0     8.0 js/pocketsphinx.js:1
-compallsen no      no js/pocketsphinx.js:1
-debug              0 js/pocketsphinx.js:1
-dict                js/pocketsphinx.js:1
-dictcase   no      no js/pocketsphinx.js:1
-dither     no      no js/pocketsphinx.js:1
-doublebw   no      no js/pocketsphinx.js:1
-ds     1       1 js/pocketsphinx.js:1
-fdict               js/pocketsphinx.js:1
-feat       1s_c_d_dd   1s_c_d_dd js/pocketsphinx.js:1
-featparams          js/pocketsphinx.js:1
-fillprob   1e-8        1.000000e-08 js/pocketsphinx.js:1
-frate      100     100 js/pocketsphinx.js:1
-fsg                 js/pocketsphinx.js:1
-fsgusealtpron  yes     yes js/pocketsphinx.js:1
-fsgusefiller   yes     yes js/pocketsphinx.js:1
-fwdflat    yes     yes js/pocketsphinx.js:1
-fwdflatbeam    1e-64       1.000000e-64 js/pocketsphinx.js:1
-fwdflatefwid   4       4 js/pocketsphinx.js:1
-fwdflatlw  8.5     8.500000e+00 js/pocketsphinx.js:1
-fwdflatsfwin   25      25 js/pocketsphinx.js:1
-fwdflatwbeam   7e-29       7.000000e-29 js/pocketsphinx.js:1
-fwdtree    yes     yes js/pocketsphinx.js:1
-hmm                rm1_200 js/pocketsphinx.js:1
-infile              js/pocketsphinx.js:1
-input_endian   little      little js/pocketsphinx.js:1
-jsgf                js/pocketsphinx.js:1
-kdmaxbbi   -1      -1 js/pocketsphinx.js:1
-kdmaxdepth 0       0 js/pocketsphinx.js:1
-kdtree              js/pocketsphinx.js:1
-keyphrase           js/pocketsphinx.js:1
-kws                 js/pocketsphinx.js:1
-kws_plp    1e-1        1.000000e-01 js/pocketsphinx.js:1
-kws_threshold  1       1.000000e+00 js/pocketsphinx.js:1
-latsize    5000        5000 js/pocketsphinx.js:1
-lda                 js/pocketsphinx.js:1
-ldadim     0       0 js/pocketsphinx.js:1
-lextreedump    0       0 js/pocketsphinx.js:1
-lifter     0       0 js/pocketsphinx.js:1
-lm              js/pocketsphinx.js:1
-lmctl               js/pocketsphinx.js:1
-lmname              js/pocketsphinx.js:1
-logbase    1.0001      1.000100e+00 js/pocketsphinx.js:1
-logfn               js/pocketsphinx.js:1
-logspec    no      no js/pocketsphinx.js:1
-lowerf     133.33334   1.333333e+02 js/pocketsphinx.js:1
-lpbeam     1e-40       1.000000e-40 js/pocketsphinx.js:1
-lponlybeam 7e-29       7.000000e-29 js/pocketsphinx.js:1
-lw     6.5     6.500000e+00 js/pocketsphinx.js:1
-maxhmmpf   10000       10000 js/pocketsphinx.js:1
-maxnewoov  20      20 js/pocketsphinx.js:1
-maxwpf     -1      -1 js/pocketsphinx.js:1
-mdef                js/pocketsphinx.js:1
-mean                js/pocketsphinx.js:1
-mfclogdir           js/pocketsphinx.js:1
-min_endfr  0       0 js/pocketsphinx.js:1
-mixw                js/pocketsphinx.js:1
-mixwfloor  0.0000001   1.000000e-07 js/pocketsphinx.js:1
-mllr                js/pocketsphinx.js:1
-mmap       yes     yes js/pocketsphinx.js:1
-ncep       13      13 js/pocketsphinx.js:1
-nfft       512     512 js/pocketsphinx.js:1
-nfilt      40      40 js/pocketsphinx.js:1
-nwpen      1.0     1.000000e+00 js/pocketsphinx.js:1
-pbeam      1e-48       1.000000e-48 js/pocketsphinx.js:1
-pip        1.0     1.000000e+00 js/pocketsphinx.js:1
-pl_beam    1e-10       1.000000e-10 js/pocketsphinx.js:1
-pl_pbeam   1e-5        1.000000e-05 js/pocketsphinx.js:1
-pl_window  0       0 js/pocketsphinx.js:1
-rawlogdir           js/pocketsphinx.js:1
-remove_dc  no      no js/pocketsphinx.js:1
-remove_noise   yes     no js/pocketsphinx.js:1
-remove_silence yes     yes js/pocketsphinx.js:1
-round_filters  yes     yes js/pocketsphinx.js:1
-samprate   16000       1.600000e+04 js/pocketsphinx.js:1
-seed       -1      -1 js/pocketsphinx.js:1
-sendump             js/pocketsphinx.js:1
-senlogdir           js/pocketsphinx.js:1
-senmgau             js/pocketsphinx.js:1
-silprob    0.005       5.000000e-03 js/pocketsphinx.js:1
-smoothspec no      no js/pocketsphinx.js:1
-svspec              js/pocketsphinx.js:1
-time       no      no js/pocketsphinx.js:1
-tmat                js/pocketsphinx.js:1
-tmatfloor  0.0001      1.000000e-04 js/pocketsphinx.js:1
-topn       4       4 js/pocketsphinx.js:1
-topn_beam  0       0 js/pocketsphinx.js:1
-toprule             js/pocketsphinx.js:1
-transform  legacy      legacy js/pocketsphinx.js:1
-unit_area  yes     yes js/pocketsphinx.js:1
-upperf     6855.4976   6.855498e+03 js/pocketsphinx.js:1
-usewdphones    no      no js/pocketsphinx.js:1
-uw     1.0     1.000000e+00 js/pocketsphinx.js:1
-vad_postspeech 50      50 js/pocketsphinx.js:1
-vad_prespeech  10      10 js/pocketsphinx.js:1
-vad_threshold  2.0     2.000000e+00 js/pocketsphinx.js:1
-var                 js/pocketsphinx.js:1
-varfloor   0.0001      1.000000e-04 js/pocketsphinx.js:1
-varnorm    no      no js/pocketsphinx.js:1
-verbose    no      no js/pocketsphinx.js:1
-warp_params             js/pocketsphinx.js:1
-warp_type  inverse_linear  inverse_linear js/pocketsphinx.js:1
-wbeam      7e-29       7.000000e-29 js/pocketsphinx.js:1
-wip        0.65        6.500000e-01 js/pocketsphinx.js:1
-wlen       0.025625    2.562500e-02 js/pocketsphinx.js:1
 js/pocketsphinx.js:1
INFO: cmd_ln.c(696): Parsing command line: js/pocketsphinx.js:1
\ js/pocketsphinx.js:1
    -nfilt 40 \ js/pocketsphinx.js:1
    -lowerf 133.3334 \ js/pocketsphinx.js:1
    -upperf 6855.4976 \ js/pocketsphinx.js:1
    -feat s2_4x \ js/pocketsphinx.js:1
    -agc none \ js/pocketsphinx.js:1
    -cmn current \ js/pocketsphinx.js:1
    -varnorm no  js/pocketsphinx.js:1
 js/pocketsphinx.js:1
Current configuration: js/pocketsphinx.js:1
[NAME]      [DEFLT]     [VALUE] js/pocketsphinx.js:1
-agc        none        none js/pocketsphinx.js:1
-agcthresh  2.0     2.000000e+00 js/pocketsphinx.js:1
-alpha      0.97        9.700000e-01 js/pocketsphinx.js:1
-ceplen     13      13 js/pocketsphinx.js:1
-cmn        current     current js/pocketsphinx.js:1
-cmninit    8.0     8.0 js/pocketsphinx.js:1
-dither     no      no js/pocketsphinx.js:1
-doublebw   no      no js/pocketsphinx.js:1
-feat       1s_c_d_dd   s2_4x js/pocketsphinx.js:1
-frate      100     100 js/pocketsphinx.js:1
-input_endian   little      little js/pocketsphinx.js:1
-lda                 js/pocketsphinx.js:1
-ldadim     0       0 js/pocketsphinx.js:1
-lifter     0       0 js/pocketsphinx.js:1
-logspec    no      no js/pocketsphinx.js:1
-lowerf     133.33334   1.333334e+02 js/pocketsphinx.js:1
-ncep       13      13 js/pocketsphinx.js:1
-nfft       512     512 js/pocketsphinx.js:1
-nfilt      40      40 js/pocketsphinx.js:1
-remove_dc  no      no js/pocketsphinx.js:1
-remove_noise   yes     no js/pocketsphinx.js:1
-remove_silence yes     yes js/pocketsphinx.js:1
-round_filters  yes     yes js/pocketsphinx.js:1
-samprate   16000       1.600000e+04 js/pocketsphinx.js:1
-seed       -1      -1 js/pocketsphinx.js:1
-smoothspec no      no js/pocketsphinx.js:1
-svspec              js/pocketsphinx.js:1
-transform  legacy      legacy js/pocketsphinx.js:1
-unit_area  yes     yes js/pocketsphinx.js:1
-upperf     6855.4976   6.855498e+03 js/pocketsphinx.js:1
-vad_postspeech 50      50 js/pocketsphinx.js:1
-vad_prespeech  10      10 js/pocketsphinx.js:1
-vad_threshold  2.0     2.000000e+00 js/pocketsphinx.js:1
-varnorm    no      no js/pocketsphinx.js:1
-verbose    no      no js/pocketsphinx.js:1
-warp_params             js/pocketsphinx.js:1
-warp_type  inverse_linear  inverse_linear js/pocketsphinx.js:1
-wlen       0.025625    2.562500e-02 js/pocketsphinx.js:1
 js/pocketsphinx.js:1
INFO: acmod.c(252): Parsed model-specific feature parameters from rm1_200/feat.params js/pocketsphinx.js:1
INFO: feat.c(715): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none' js/pocketsphinx.js:1
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0 js/pocketsphinx.js:1
INFO: mdef.c(517): Reading model definition: rm1_200/mdef js/pocketsphinx.js:1
INFO: mdef.c(530): Found byte-order mark BMDF, assuming this is a binary mdef file js/pocketsphinx.js:1
INFO: bin_mdef.c(336): Reading binary model definition: rm1_200/mdef js/pocketsphinx.js:1
INFO: bin_mdef.c(516): 45 CI-phone, 30080 CD-phone, 3 emitstate/phone, 135 CI-sen, 335 Sen, 199 Sen-Seq js/pocketsphinx.js:1
INFO: tmat.c(206): Reading HMM transition probability matrices: rm1_200/transition_matrices js/pocketsphinx.js:1
INFO: acmod.c(124): Attempting to use SCHMM computation module js/pocketsphinx.js:1
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: rm1_200/means js/pocketsphinx.js:1
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size:  js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x24 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x3 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: rm1_200/variances js/pocketsphinx.js:1
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size:  js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x24 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x3 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(354): 0 variance values floored js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(904): Loading senones from dump file rm1_200/sendump js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(991): Rows: 256, Columns: 335 js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0 0 js/pocketsphinx.js:1
INFO: dict.c(320): Allocating 4099 * 20 bytes (80 KiB) for word entries js/pocketsphinx.js:1
INFO: dict.c(342): Reading filler dictionary: rm1_200/noisedict js/pocketsphinx.js:1
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones js/pocketsphinx.js:1
INFO: dict.c(345): 3 words read js/pocketsphinx.js:1
INFO: dict2pid.c(396): Building PID tables for dictionary js/pocketsphinx.js:1
INFO: dict2pid.c(406): Allocating 45^3 * 2 bytes (177 KiB) for word-initial triphones js/pocketsphinx.js:1
INFO: dict2pid.c(132): Allocated 24480 bytes (23 KiB) for word-final triphones js/pocketsphinx.js:1
INFO: dict2pid.c(196): Allocated 24480 bytes (23 KiB) for single-phone word triphones js/pocketsphinx.js:1
ERROR: "pocketsphinx.c", line 931: No search module is selected, did you forget to specify a language model or grammar? js/pocketsphinx.js:1
Object {status: "error", command: "start", code: Object}code: Object__proto__: Object__defineGetter__: function __defineGetter__() { [native code] }__defineSetter__: function __defineSetter__() { [native code] }__lookupGetter__: function __lookupGetter__() { [native code] }__lookupSetter__: function __lookupSetter__() { [native code] }constructor: function Object() { [native code] }hasOwnProperty: function hasOwnProperty() { [native code] }isPrototypeOf: function isPrototypeOf() { [native code] }propertyIsEnumerable: function propertyIsEnumerable() { [native code] }toLocaleString: function toLocaleString() { [native code] }toString: function toString() { [native code] }valueOf: function valueOf() { [native code] }get __proto__: function __proto__() { [native code] }set __proto__: function __proto__() { [native code] }command: "start"status: "error"__proto__: Object

willemmulder avatar Oct 15 '14 19:10 willemmulder

OK, great, so your issue is that the parameters to load the SLM and dictionary are not given, so it does not have any search module. Maybe you did not give it when initializing:

{command: 'initialize', data: [["-lm", "your_model"], ["-dict", "your_dictionary"]]}

syl22-00 avatar Oct 15 '14 19:10 syl22-00

We're getting there, but I feel like swimming in the middle of an ocean and you are my only hope :-)

I now call

var initRecognizer = function() {
          // You can pass parameters to the recognizer, such as : {command: 'initialize', data: [["-hmm", "my_model"], ["-fwdflat", "no"]]}
          postRecognizerJob({command: 'initialize', data: [["-lm", "cmusphinx-5.0-en-us.lm.dmp"], ["-dict", "cmudict.hub4.06d.dict"]]},
            function() {
              postRecognizerJob({command: 'load',
                data: ["../../model-en-us/dict.js", "../../model-en-us/lm.js"]
               }, function() { 
                // Done loading!
                recognizerReady();
               });
            }
          );
      };

but it gives me

ERROR: "dict.c", line 275: Failed to open dictionary file 'cmudict.hub4.06d.dict' for reading: No such file or directory 

which makes sense, because those files do not exist. But giving it the paths to the files also doesn't work, and I wonder anyhow why I would want to give it the paths to the actual files, because the files only get loaded 2 lines further down the chain, with the 'load' command... Right?

willemmulder avatar Oct 15 '14 19:10 willemmulder

Yes, you are very close.

  • The file name you give is correct, this is the name on the virtual file system that emscripten creates. The javascript file adds the file to the virtual file system
  • As you said, load should come before init. Load will add those files on file system, and they will be accessed when the recognizer is initialized, so try to do the other way around with the calls to "load" and "initialize" and their callbacks.

syl22-00 avatar Oct 15 '14 19:10 syl22-00

Yes, that solves it! :-)

Now, a last step to get the LM and the Dictionary aligned, I guess...? I get this error now

INFO: acmod.c(252): Parsed model-specific feature parameters from rm1_200/feat.params js/pocketsphinx.js:1
INFO: feat.c(715): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none' js/pocketsphinx.js:1
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0 js/pocketsphinx.js:1
INFO: mdef.c(517): Reading model definition: rm1_200/mdef js/pocketsphinx.js:1
INFO: mdef.c(530): Found byte-order mark BMDF, assuming this is a binary mdef file js/pocketsphinx.js:1
INFO: bin_mdef.c(336): Reading binary model definition: rm1_200/mdef js/pocketsphinx.js:1
INFO: bin_mdef.c(516): 45 CI-phone, 30080 CD-phone, 3 emitstate/phone, 135 CI-sen, 335 Sen, 199 Sen-Seq js/pocketsphinx.js:1
INFO: tmat.c(206): Reading HMM transition probability matrices: rm1_200/transition_matrices js/pocketsphinx.js:1
INFO: acmod.c(124): Attempting to use SCHMM computation module js/pocketsphinx.js:1
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: rm1_200/means js/pocketsphinx.js:1
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size:  js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x24 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x3 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: rm1_200/variances js/pocketsphinx.js:1
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size:  js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x24 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x3 js/pocketsphinx.js:1
INFO: ms_gauden.c(294):  256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(354): 0 variance values floored js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(904): Loading senones from dump file rm1_200/sendump js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(991): Rows: 256, Columns: 335 js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0 0 js/pocketsphinx.js:1
INFO: dict.c(320): Allocating 135415 * 20 bytes (2644 KiB) for word entries js/pocketsphinx.js:1
INFO: dict.c(333): Reading main dictionary: cmudict.hub4.06d.dict js/pocketsphinx.js:1
INFO: dict.c(213): Allocated 992 KiB for strings, 1633 KiB for phones js/pocketsphinx.js:1
INFO: dict.c(336): 131316 words read js/pocketsphinx.js:1
INFO: dict.c(342): Reading filler dictionary: rm1_200/noisedict js/pocketsphinx.js:1
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones js/pocketsphinx.js:1
INFO: dict.c(345): 3 words read js/pocketsphinx.js:1
INFO: dict2pid.c(396): Building PID tables for dictionary js/pocketsphinx.js:1
INFO: dict2pid.c(406): Allocating 45^3 * 2 bytes (177 KiB) for word-initial triphones js/pocketsphinx.js:1
INFO: dict2pid.c(132): Allocated 24480 bytes (23 KiB) for word-final triphones js/pocketsphinx.js:1
INFO: dict2pid.c(196): Allocated 24480 bytes (23 KiB) for single-phone word triphones js/pocketsphinx.js:1
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(196): ngrams 1=19794, 2=1377200, 3=3178194 js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(242):    19794 = LM.unigrams(+trailer) read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(288):  1377200 = LM.bigrams(+trailer) read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(314):  3178194 = LM.trigrams read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(339):    57155 = LM.prob2 entries read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(359):    10935 = LM.bo_wt2 entries read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(379):    34843 = LM.prob3 entries read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(407):     2690 = LM.tseg_base entries read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(463):    19794 = ascii word strings read js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(99): 799 unique initial diphones js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 77 single-phone words js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(186): Creating search tree js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 77 single-phone words js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 128 js/pocketsphinx.js:1
ERROR: "ngram_search_fwdtree.c", line 336: No word from the language model has pronunciation in the dictionary js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(339): after: 0 root, 0 non-root channels, 3 single-phone words js/pocketsphinx.js:1
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25 

willemmulder avatar Oct 15 '14 19:10 willemmulder

So it seems like the dictionary is in uppercase (which seems normal to me), but the SLM is in lowercase. I don't know whether there is a canonical dictionary for your SLM, but you could try to adjust the case of one of the files (such as tr '[:upper:]' '[:lower:]' < input.txt > output.txt).

The tricky part is that:

  • The phonemes must be uppercase, so if you convert the dictionary file to lowercase, the phonemes will be wrong
  • your SLM is binary so you can't process it with text utilities.

So what I'd do is start from the .lm file, convert it to uppercase, then regenerate the binary version. It is explained there: http://cmusphinx.sourceforge.net/wiki/tutoriallm

sphinx_lm_convert -i model.lm -o model.dmp

@nshmyrev do you have more input?

syl22-00 avatar Oct 15 '14 20:10 syl22-00

All above is not recommended. For US English we distribute cmu07a.dic as part of pocketsphinx sources which you can use with en-us generic acoustic model:

https://github.com/cmusphinx/pocketsphinx/blob/master/model/lm/en_US/cmu07a.dic

nshmyrev avatar Oct 15 '14 20:10 nshmyrev

Thanks @nshmyrev that could be the final step to make it work. I had always used cmudict/cmudict.0.7a from the subversion tree which is in uppercase.

syl22-00 avatar Oct 15 '14 20:10 syl22-00

@nshmyrev @syl22-00 I'm going to try that tonight! Will let you know how it goes :-)

willemmulder avatar Oct 16 '14 07:10 willemmulder

All right then... I got it working!

The only drawback is the terrible recognition ;-) Is there a recommended set of files (LM + DIC) that I could load that is sort of 'proven' to work?

And what is an acoustic model? What does it do? And how could I create one for say, Dutch?

What I understand right now is:

  • A dictionary defines a set of written words and their pronounciation (i.e. with a phonetic alphabet)
  • A language model defines which words are likely to occur after one another
  • An acoustic model eh... defines which phonetics resemble each other? (guessing here...)

willemmulder avatar Oct 23 '14 18:10 willemmulder

Congratulations for having it finally working!

I should have said it earlier, but the acoustic model provided in the repository is very small (200 senones, very few parameters, so not very accurate), and built on the RM1 corpus which is small. So if you want to make something that has good performance you should use a better one. @nshmyrev will probably know much better, but I guess you should try one of these: http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/

Using them is similar to what you did for the dictionary and language model. The main difference is that one acoustic model has a bunch of files. You can place them inside one folder and give the folder name as parameter with -hmm, or give the individual files as parameters (-mdef, -variances, etc.).

Also I would say your understanding is quite correct. Just to clarify dictionary and acoustic model: There is a set of phonemes (40-something for English), the dictionary maps words to phoneme sequences, and the acoustic model describes the statistical distributions of the way people speak these phonemes.

So if you want to recognize Dutch, you'll need dictionary, language and acoustic models for Dutch. I believe some already exist (http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Dutch%20Voxforge/), or you can train them yourself. There is abundant documentation on how to do that on http://cmusphinx.org.

syl22-00 avatar Oct 23 '14 19:10 syl22-00

Thanks :-)

Quick iteration to verify that I understand correctly:

  • a waveform is translated into senones, based on the acoustic model (which describes ao how phones form certain senones in different contexts and how (from a waveform and context) it can be distilled which senone was likely to be 'meant')
  • senone sequences are mapped to words using the dictionary, whereby the search for possible words is limited (and thus quickened) using the statistical language model

A restricted acoustic model will cause the recognizer to make mistakes in mapping my actual speech (phones) to the right senones, and thus, the wrong words will be presented on the screen. Correct?

Let's see if I can get the -hmm option to work..!

Oh, and am is it correct that the pocketsphinx.js file already includes a acoustic model? Can I somehow get a 'raw' pocketsphinx.js file without an embedded acoustic model?

willemmulder avatar Oct 23 '14 20:10 willemmulder

Hum, not really, you seem to be confusing phonemes and senones. Senones basically just refers to the number of parameters in your model. Since there is a lot of variation in the way people pronounce one phoneme, you need a lot of parameters to capture the distributions, but you also need a lot of data to learn all these parameters.

But in short, you should have as much data as possible to train you acoustic model, and you need many senones to to take advantage of that.

syl22-00 avatar Oct 23 '14 20:10 syl22-00

I can't get the acoustic model to load.

Whatever I try giving to the init function, it will get back to me saying that

 ERROR: "acmod.c", line 90: Folder 'model-en-us/acoustic' does not contain acoustic model definition 'mdef' 

What I try at the moment is

    var initRecognizer = function() {
        // You can pass parameters to the recognizer, such as : {command: 'initialize', data: [["-hmm", "my_model"], ["-fwdflat", "no"]]}
        postRecognizerJob({command: 'load',  data: ["../../model-en-us/dict.js", "../../model-en-us/lm.js"]}, function() { 
          postRecognizerJob({command: 'initialize', data: [["-lm", "cmusphinx-5.0-en-us.lm.dmp"], ["-dict", "cmu07a.dic"], ["-hmm", "./model-en-us/acoustic"]]}, function() {
            // Done loading!
            recognizerReady();
          });
        });
      };

I was thinking I might need to convert all these files (mdef, variances etc) to Javascript files with the file packager, but then again, that would load them in the 'virtual disk' where the init function can read files from, but how would I then specify a folder to the -hmm parameter?

willemmulder avatar Oct 23 '14 20:10 willemmulder

I guess that should be quite clear from the docs: https://github.com/syl22-00/pocketsphinx.js#ii-package-model-files-outside-the-main-javascript

In the case of that example, you'd have "-hmm", "hub4wsj_sc_8k".

syl22-00 avatar Oct 23 '14 20:10 syl22-00

(Yes I was confused. I thought that a senon was a diphone or triphone, i.e. multiple phones together. I tried googling for 'senone' but there is not a single simple explanation. The one the CMU site is pretty vague... What you are saying is that a phone(me) is the smallest significant linguistic unit, and a senone describes the likeliness that a certain part of speech is a certain phoneme, based on its features and on its context (including phonemes to the left and the right). The acoustic model contains a set of senones, split by mdef, variances etc files. The more senones, the better speech can be mapped to phonemes. Right? Right?)

willemmulder avatar Oct 23 '14 21:10 willemmulder

Imho that is not really clear from the docs (although I love the extensive docs! :-)), since the docs don't describe what exactly the file packager is doing, how it creates a virtual file structure, and how you would use that virtual file structure in your init function to load files.

It thus does not explain how to generate a virtual 'folder' that you can then use in the init function. But I think I figured it out:

It appears to be impossible to generate .js files in a 'virtual folder' from within the actual folder where the source files are stored. I have to go one folder up, thén run the file packager commands, point to the files ín the folder, and then the folder structure is preserved in the generated .js files.

I will generate all the .js files, load them (hope the browser doesn't crash), init them, and report back here!

willemmulder avatar Oct 23 '14 21:10 willemmulder

That's what --embed is for: If you run the packager with --embed hub4wsj_sc_8k/variances, the generated JavaScript file will create a virtual variances file inside a virtual hub4wsj_sc_8k/ folder.

syl22-00 avatar Oct 23 '14 21:10 syl22-00

Exactly, but, as far as I can see, --embed also functions as the pointer to the file that you want to embed, and thus, you cannot call the file packager from within the source directory itself, but only from one directory up (since you need to point 'into' the directory, which is only possible from the outside).

willemmulder avatar Oct 24 '14 06:10 willemmulder

@willemmulder Hello. I want to know if these steps worked out. Would you mind if you can open this source code for me? Then, I want to compare my work to this example. Thank you! =)

yjc0703 avatar Nov 19 '15 07:11 yjc0703

Hi @yjc0703 I have been searching, but can't find it back... We didn't go for this solution in the end also, so probably it got lost somewhere. I'm sorry!

willemmulder avatar Nov 20 '15 09:11 willemmulder

@willemmulder oh I see. Thank you for your kindness reply. Thank you!

yjc0703 avatar Nov 27 '15 00:11 yjc0703

hai,I am learning CMUSphinx,but I am in trouble: ERROR: "ngram_search_fwdtree.c", line 336: No word from the language model has pronunciation in the dictionary

Haji9969 avatar Apr 18 '19 17:04 Haji9969