pocketsphinx.js
pocketsphinx.js copied to clipboard
Example with official Language Models as provided by Pocketsphinx
Hi,
I'm just getting 'deeper' into speech-recognition and would really benefit from an example using the 'official language models' files as they are provided for pocketsphinx here: http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
Right now the examples use simple arrays with lists of words, but it would really help me to see a 'real' example where a real language model + dictionary is used.
Would that be possible?
Hi,
From the sample app that uses grammars (webapp/live.html
), you can adjust it to use a statistical language model. Note that grammars can be much more elaborate than what we have in this sample app though. Anyways, the basic steps to use a statistical language models are:
- Find a language model and a dictionary. You can find some in the page you linked, in the
models
folder in pocketsphinx svn repository, or on voxforge.org. They should be compatible with your acoustic model, so if you use the provided acoustic model, make sure you use a dictionary and a language model for English! - Convert these two files into JavaScript files that emscripten can access.
- Ask recognizer.js to load the model and dictionary.
- Take out any call to addWords and addGrammar.
That should be it.
Thanks for the guide.
Would you mind if I tried setting up an example and make a pull-request when it works?
That'd be awesome. I can also help getting it working if you find any issue.
Thanks a lot to you guys!
So let's document my first steps ;-)
First, I installed Emscripten and read through its docs a little.
Then, I downloaded the English Generic language model here: http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/
It does not have a dictionary... So I used this one from here (although I don't have the slightest clue whether you can just mix and match language models and dictionaries): http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20HUB4%20Language%20Model/
Then I called the file_packager.py (which I assume just packs a binary file in JS format and wraps it in a module definition so other scripts can load the script and get hold of its data...?) like this:
python "C:\Program Files\Emscripten\emscripten\1.25.0\tools\file_packager.py" somefile.js --embed cmudict.hub4.06d.dict --js-output=dict.js
and
python "C:\Program Files\Emscripten\emscripten\1.25.0\tools\file_packager.py" somefile.js --embed en-us.lm --js-output=lm.js
where somefile.js doesn't actually exist, but I didn't understand why that parameter was necessary anyways. What I get is two rather large files:
- dict.js (19 Mb)
- lm.js (540 Mb)
I then spun up a Nodejs server, served the webapp folder, and modified the live.html file such that the initRecognizer function nows looks like:
var initRecognizer = function() {
// You can pass parameters to the recognizer, such as : {command: 'initialize', data: [["-hmm", "my_model"], ["-fwdflat", "no"]]}
postRecognizerJob({command: 'initialize'},
function() {
postRecognizerJob({command: 'load',
data: ["../../model-en-us/dict.js", "../../model-en-us/lm.js"]
}, function() {
// Done loading!
recognizerReady();
});
}
);
};
If I check the console, I see that it is loading the dict.js and lm.js files, they have a 200OK code. Yet, their file is smaller than their actual size, and the worker comes back with a "NETWORK_ERROR".
Can they not handle these large files?
Does that mean that I really have to embed the lm.js and dict.js files in the pocketsphinx.js file directly (and hope that that file does get loaded correctly)?
Thanks for any help!
Hi @willemmulder ,
I observed NETWORK_ERROR being returned when JavaScript files loaded by importScripts are not found. Are you sure you have the correct relative path? I think paths are relative to the HTML file that loads the scripts. Also, if you have relative paths that goes up in the folder tree, make sure you never get beyond your web server root.
I never tried that large files. I am not sure whether there is a limit by the browser, but you would probably need to increase TOTAL_MEMORY=100663296
in CMakeLists.txt
.
I would suggest you to start with a smaller language model, and I do not think it will make any difference to embed it in pocketsphinx.js
, it would just concatenate the files into one.
Your argument somefile.js
should be the pocketsphinx.js
file that will eventually have to load these files. I am not sure if it matter though.
Good luck, and let us know how it goes.
For the generic language you need to use
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp/download
It's size is 25Mb. It should not grow after conversion to Javascript, that's actually my main concern about pocketsphinx.js, somehow it increases the size of the data files.
Ok, so a DMP file is the binary version of a normal .lm file, right? So it's more compact. Good.
( Can I use any .dict file? Since the http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/ folder does not provide a dictionary. Is that a problem? How does the combination of language model (lm) and dictionary (dict) work? )
The Javascript generated file is now only 99mb, and it loads successfully!
"Loading page Initializing web audio and speech recognizer, waiting for approval to access the microphone Audio recorder ready Recognizer ready"
Excellent.
However, when I press the start button, I get
Error in switchgrammar with code [object Object] Error in process with code [object Object] Error in process with code [object Object] ... more of those
Any clues?
Just a note on what I tried: the recognizer wants to do
recognizer.switchSearch(parseInt(id));
where I assume that id is the identifier at the bottom of the generated Javascript files:
Module['FS_createDataFile']('/', 'cmusphinx-5.0-en-us.lm.dmp', fileData0, true, true);
So I use cmusphinx-5.0-en-us.lm.dmp
as id when I start the recognizer:
var startRecording = function() {
var id = "cmusphinx-5.0-en-us.lm.dmp";
if (recorder && recorder.start(id)) displayRecording(true);
};
But that fails with the above error message...
It seems like it is trying to switch grammar, which should not happen. Probably you are providing and id when calling start
, which you should not (it should not attempt to switch to a specific grammar, but rather use the loaded SLM)
So, as I was just writing, just call recorder.start()
, id
is for switching between grammars, you are not using one.
(So providing an id means telling the recognizer to load a specific grammarSet?)
Even so, I tried without an id, and it jumps into the next error:
Error in start with code [object Object] Error in process with code [object Object]
Can you post the logs that you get in the JavaScript console? You should have a bunch of output from pocketsphinx. Also it would be useful if you could find what's that object returned as error code?
console.log(e.data.code);
It seems to be an empty object... Here's the full log:
INFO: cmd_ln.c(696): Parsing command line: js/pocketsphinx.js:1
\ js/pocketsphinx.js:1
-bestpath no \ js/pocketsphinx.js:1
-hmm rm1_200 \ js/pocketsphinx.js:1
-remove_noise no js/pocketsphinx.js:1
js/pocketsphinx.js:1
Current configuration: js/pocketsphinx.js:1
[NAME] [DEFLT] [VALUE] js/pocketsphinx.js:1
-adcdev js/pocketsphinx.js:1
-agc none none js/pocketsphinx.js:1
-agcthresh 2.0 2.000000e+00 js/pocketsphinx.js:1
-allphone js/pocketsphinx.js:1
-allphone_ci no no js/pocketsphinx.js:1
-alpha 0.97 9.700000e-01 js/pocketsphinx.js:1
-argfile js/pocketsphinx.js:1
-ascale 20.0 2.000000e+01 js/pocketsphinx.js:1
-aw 1 1 js/pocketsphinx.js:1
-backtrace no no js/pocketsphinx.js:1
-beam 1e-48 1.000000e-48 js/pocketsphinx.js:1
-bestpath yes no js/pocketsphinx.js:1
-bestpathlw 9.5 9.500000e+00 js/pocketsphinx.js:1
-bghist no no js/pocketsphinx.js:1
-ceplen 13 13 js/pocketsphinx.js:1
-cmn current current js/pocketsphinx.js:1
-cmninit 8.0 8.0 js/pocketsphinx.js:1
-compallsen no no js/pocketsphinx.js:1
-debug 0 js/pocketsphinx.js:1
-dict js/pocketsphinx.js:1
-dictcase no no js/pocketsphinx.js:1
-dither no no js/pocketsphinx.js:1
-doublebw no no js/pocketsphinx.js:1
-ds 1 1 js/pocketsphinx.js:1
-fdict js/pocketsphinx.js:1
-feat 1s_c_d_dd 1s_c_d_dd js/pocketsphinx.js:1
-featparams js/pocketsphinx.js:1
-fillprob 1e-8 1.000000e-08 js/pocketsphinx.js:1
-frate 100 100 js/pocketsphinx.js:1
-fsg js/pocketsphinx.js:1
-fsgusealtpron yes yes js/pocketsphinx.js:1
-fsgusefiller yes yes js/pocketsphinx.js:1
-fwdflat yes yes js/pocketsphinx.js:1
-fwdflatbeam 1e-64 1.000000e-64 js/pocketsphinx.js:1
-fwdflatefwid 4 4 js/pocketsphinx.js:1
-fwdflatlw 8.5 8.500000e+00 js/pocketsphinx.js:1
-fwdflatsfwin 25 25 js/pocketsphinx.js:1
-fwdflatwbeam 7e-29 7.000000e-29 js/pocketsphinx.js:1
-fwdtree yes yes js/pocketsphinx.js:1
-hmm rm1_200 js/pocketsphinx.js:1
-infile js/pocketsphinx.js:1
-input_endian little little js/pocketsphinx.js:1
-jsgf js/pocketsphinx.js:1
-kdmaxbbi -1 -1 js/pocketsphinx.js:1
-kdmaxdepth 0 0 js/pocketsphinx.js:1
-kdtree js/pocketsphinx.js:1
-keyphrase js/pocketsphinx.js:1
-kws js/pocketsphinx.js:1
-kws_plp 1e-1 1.000000e-01 js/pocketsphinx.js:1
-kws_threshold 1 1.000000e+00 js/pocketsphinx.js:1
-latsize 5000 5000 js/pocketsphinx.js:1
-lda js/pocketsphinx.js:1
-ldadim 0 0 js/pocketsphinx.js:1
-lextreedump 0 0 js/pocketsphinx.js:1
-lifter 0 0 js/pocketsphinx.js:1
-lm js/pocketsphinx.js:1
-lmctl js/pocketsphinx.js:1
-lmname js/pocketsphinx.js:1
-logbase 1.0001 1.000100e+00 js/pocketsphinx.js:1
-logfn js/pocketsphinx.js:1
-logspec no no js/pocketsphinx.js:1
-lowerf 133.33334 1.333333e+02 js/pocketsphinx.js:1
-lpbeam 1e-40 1.000000e-40 js/pocketsphinx.js:1
-lponlybeam 7e-29 7.000000e-29 js/pocketsphinx.js:1
-lw 6.5 6.500000e+00 js/pocketsphinx.js:1
-maxhmmpf 10000 10000 js/pocketsphinx.js:1
-maxnewoov 20 20 js/pocketsphinx.js:1
-maxwpf -1 -1 js/pocketsphinx.js:1
-mdef js/pocketsphinx.js:1
-mean js/pocketsphinx.js:1
-mfclogdir js/pocketsphinx.js:1
-min_endfr 0 0 js/pocketsphinx.js:1
-mixw js/pocketsphinx.js:1
-mixwfloor 0.0000001 1.000000e-07 js/pocketsphinx.js:1
-mllr js/pocketsphinx.js:1
-mmap yes yes js/pocketsphinx.js:1
-ncep 13 13 js/pocketsphinx.js:1
-nfft 512 512 js/pocketsphinx.js:1
-nfilt 40 40 js/pocketsphinx.js:1
-nwpen 1.0 1.000000e+00 js/pocketsphinx.js:1
-pbeam 1e-48 1.000000e-48 js/pocketsphinx.js:1
-pip 1.0 1.000000e+00 js/pocketsphinx.js:1
-pl_beam 1e-10 1.000000e-10 js/pocketsphinx.js:1
-pl_pbeam 1e-5 1.000000e-05 js/pocketsphinx.js:1
-pl_window 0 0 js/pocketsphinx.js:1
-rawlogdir js/pocketsphinx.js:1
-remove_dc no no js/pocketsphinx.js:1
-remove_noise yes no js/pocketsphinx.js:1
-remove_silence yes yes js/pocketsphinx.js:1
-round_filters yes yes js/pocketsphinx.js:1
-samprate 16000 1.600000e+04 js/pocketsphinx.js:1
-seed -1 -1 js/pocketsphinx.js:1
-sendump js/pocketsphinx.js:1
-senlogdir js/pocketsphinx.js:1
-senmgau js/pocketsphinx.js:1
-silprob 0.005 5.000000e-03 js/pocketsphinx.js:1
-smoothspec no no js/pocketsphinx.js:1
-svspec js/pocketsphinx.js:1
-time no no js/pocketsphinx.js:1
-tmat js/pocketsphinx.js:1
-tmatfloor 0.0001 1.000000e-04 js/pocketsphinx.js:1
-topn 4 4 js/pocketsphinx.js:1
-topn_beam 0 0 js/pocketsphinx.js:1
-toprule js/pocketsphinx.js:1
-transform legacy legacy js/pocketsphinx.js:1
-unit_area yes yes js/pocketsphinx.js:1
-upperf 6855.4976 6.855498e+03 js/pocketsphinx.js:1
-usewdphones no no js/pocketsphinx.js:1
-uw 1.0 1.000000e+00 js/pocketsphinx.js:1
-vad_postspeech 50 50 js/pocketsphinx.js:1
-vad_prespeech 10 10 js/pocketsphinx.js:1
-vad_threshold 2.0 2.000000e+00 js/pocketsphinx.js:1
-var js/pocketsphinx.js:1
-varfloor 0.0001 1.000000e-04 js/pocketsphinx.js:1
-varnorm no no js/pocketsphinx.js:1
-verbose no no js/pocketsphinx.js:1
-warp_params js/pocketsphinx.js:1
-warp_type inverse_linear inverse_linear js/pocketsphinx.js:1
-wbeam 7e-29 7.000000e-29 js/pocketsphinx.js:1
-wip 0.65 6.500000e-01 js/pocketsphinx.js:1
-wlen 0.025625 2.562500e-02 js/pocketsphinx.js:1
js/pocketsphinx.js:1
INFO: cmd_ln.c(696): Parsing command line: js/pocketsphinx.js:1
\ js/pocketsphinx.js:1
-nfilt 40 \ js/pocketsphinx.js:1
-lowerf 133.3334 \ js/pocketsphinx.js:1
-upperf 6855.4976 \ js/pocketsphinx.js:1
-feat s2_4x \ js/pocketsphinx.js:1
-agc none \ js/pocketsphinx.js:1
-cmn current \ js/pocketsphinx.js:1
-varnorm no js/pocketsphinx.js:1
js/pocketsphinx.js:1
Current configuration: js/pocketsphinx.js:1
[NAME] [DEFLT] [VALUE] js/pocketsphinx.js:1
-agc none none js/pocketsphinx.js:1
-agcthresh 2.0 2.000000e+00 js/pocketsphinx.js:1
-alpha 0.97 9.700000e-01 js/pocketsphinx.js:1
-ceplen 13 13 js/pocketsphinx.js:1
-cmn current current js/pocketsphinx.js:1
-cmninit 8.0 8.0 js/pocketsphinx.js:1
-dither no no js/pocketsphinx.js:1
-doublebw no no js/pocketsphinx.js:1
-feat 1s_c_d_dd s2_4x js/pocketsphinx.js:1
-frate 100 100 js/pocketsphinx.js:1
-input_endian little little js/pocketsphinx.js:1
-lda js/pocketsphinx.js:1
-ldadim 0 0 js/pocketsphinx.js:1
-lifter 0 0 js/pocketsphinx.js:1
-logspec no no js/pocketsphinx.js:1
-lowerf 133.33334 1.333334e+02 js/pocketsphinx.js:1
-ncep 13 13 js/pocketsphinx.js:1
-nfft 512 512 js/pocketsphinx.js:1
-nfilt 40 40 js/pocketsphinx.js:1
-remove_dc no no js/pocketsphinx.js:1
-remove_noise yes no js/pocketsphinx.js:1
-remove_silence yes yes js/pocketsphinx.js:1
-round_filters yes yes js/pocketsphinx.js:1
-samprate 16000 1.600000e+04 js/pocketsphinx.js:1
-seed -1 -1 js/pocketsphinx.js:1
-smoothspec no no js/pocketsphinx.js:1
-svspec js/pocketsphinx.js:1
-transform legacy legacy js/pocketsphinx.js:1
-unit_area yes yes js/pocketsphinx.js:1
-upperf 6855.4976 6.855498e+03 js/pocketsphinx.js:1
-vad_postspeech 50 50 js/pocketsphinx.js:1
-vad_prespeech 10 10 js/pocketsphinx.js:1
-vad_threshold 2.0 2.000000e+00 js/pocketsphinx.js:1
-varnorm no no js/pocketsphinx.js:1
-verbose no no js/pocketsphinx.js:1
-warp_params js/pocketsphinx.js:1
-warp_type inverse_linear inverse_linear js/pocketsphinx.js:1
-wlen 0.025625 2.562500e-02 js/pocketsphinx.js:1
js/pocketsphinx.js:1
INFO: acmod.c(252): Parsed model-specific feature parameters from rm1_200/feat.params js/pocketsphinx.js:1
INFO: feat.c(715): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none' js/pocketsphinx.js:1
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0 js/pocketsphinx.js:1
INFO: mdef.c(517): Reading model definition: rm1_200/mdef js/pocketsphinx.js:1
INFO: mdef.c(530): Found byte-order mark BMDF, assuming this is a binary mdef file js/pocketsphinx.js:1
INFO: bin_mdef.c(336): Reading binary model definition: rm1_200/mdef js/pocketsphinx.js:1
INFO: bin_mdef.c(516): 45 CI-phone, 30080 CD-phone, 3 emitstate/phone, 135 CI-sen, 335 Sen, 199 Sen-Seq js/pocketsphinx.js:1
INFO: tmat.c(206): Reading HMM transition probability matrices: rm1_200/transition_matrices js/pocketsphinx.js:1
INFO: acmod.c(124): Attempting to use SCHMM computation module js/pocketsphinx.js:1
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: rm1_200/means js/pocketsphinx.js:1
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size: js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x24 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x3 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: rm1_200/variances js/pocketsphinx.js:1
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size: js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x24 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x3 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(354): 0 variance values floored js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(904): Loading senones from dump file rm1_200/sendump js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(991): Rows: 256, Columns: 335 js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0 0 js/pocketsphinx.js:1
INFO: dict.c(320): Allocating 4099 * 20 bytes (80 KiB) for word entries js/pocketsphinx.js:1
INFO: dict.c(342): Reading filler dictionary: rm1_200/noisedict js/pocketsphinx.js:1
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones js/pocketsphinx.js:1
INFO: dict.c(345): 3 words read js/pocketsphinx.js:1
INFO: dict2pid.c(396): Building PID tables for dictionary js/pocketsphinx.js:1
INFO: dict2pid.c(406): Allocating 45^3 * 2 bytes (177 KiB) for word-initial triphones js/pocketsphinx.js:1
INFO: dict2pid.c(132): Allocated 24480 bytes (23 KiB) for word-final triphones js/pocketsphinx.js:1
INFO: dict2pid.c(196): Allocated 24480 bytes (23 KiB) for single-phone word triphones js/pocketsphinx.js:1
ERROR: "pocketsphinx.c", line 931: No search module is selected, did you forget to specify a language model or grammar? js/pocketsphinx.js:1
Object {status: "error", command: "start", code: Object}code: Object__proto__: Object__defineGetter__: function __defineGetter__() { [native code] }__defineSetter__: function __defineSetter__() { [native code] }__lookupGetter__: function __lookupGetter__() { [native code] }__lookupSetter__: function __lookupSetter__() { [native code] }constructor: function Object() { [native code] }hasOwnProperty: function hasOwnProperty() { [native code] }isPrototypeOf: function isPrototypeOf() { [native code] }propertyIsEnumerable: function propertyIsEnumerable() { [native code] }toLocaleString: function toLocaleString() { [native code] }toString: function toString() { [native code] }valueOf: function valueOf() { [native code] }get __proto__: function __proto__() { [native code] }set __proto__: function __proto__() { [native code] }command: "start"status: "error"__proto__: Object
OK, great, so your issue is that the parameters to load the SLM and dictionary are not given, so it does not have any search module. Maybe you did not give it when initializing:
{command: 'initialize', data: [["-lm", "your_model"], ["-dict", "your_dictionary"]]}
We're getting there, but I feel like swimming in the middle of an ocean and you are my only hope :-)
I now call
var initRecognizer = function() {
// You can pass parameters to the recognizer, such as : {command: 'initialize', data: [["-hmm", "my_model"], ["-fwdflat", "no"]]}
postRecognizerJob({command: 'initialize', data: [["-lm", "cmusphinx-5.0-en-us.lm.dmp"], ["-dict", "cmudict.hub4.06d.dict"]]},
function() {
postRecognizerJob({command: 'load',
data: ["../../model-en-us/dict.js", "../../model-en-us/lm.js"]
}, function() {
// Done loading!
recognizerReady();
});
}
);
};
but it gives me
ERROR: "dict.c", line 275: Failed to open dictionary file 'cmudict.hub4.06d.dict' for reading: No such file or directory
which makes sense, because those files do not exist. But giving it the paths to the files also doesn't work, and I wonder anyhow why I would want to give it the paths to the actual files, because the files only get loaded 2 lines further down the chain, with the 'load' command... Right?
Yes, you are very close.
- The file name you give is correct, this is the name on the virtual file system that emscripten creates. The javascript file adds the file to the virtual file system
- As you said, load should come before init. Load will add those files on file system, and they will be accessed when the recognizer is initialized, so try to do the other way around with the calls to "load" and "initialize" and their callbacks.
Yes, that solves it! :-)
Now, a last step to get the LM and the Dictionary aligned, I guess...? I get this error now
INFO: acmod.c(252): Parsed model-specific feature parameters from rm1_200/feat.params js/pocketsphinx.js:1
INFO: feat.c(715): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none' js/pocketsphinx.js:1
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0 js/pocketsphinx.js:1
INFO: mdef.c(517): Reading model definition: rm1_200/mdef js/pocketsphinx.js:1
INFO: mdef.c(530): Found byte-order mark BMDF, assuming this is a binary mdef file js/pocketsphinx.js:1
INFO: bin_mdef.c(336): Reading binary model definition: rm1_200/mdef js/pocketsphinx.js:1
INFO: bin_mdef.c(516): 45 CI-phone, 30080 CD-phone, 3 emitstate/phone, 135 CI-sen, 335 Sen, 199 Sen-Seq js/pocketsphinx.js:1
INFO: tmat.c(206): Reading HMM transition probability matrices: rm1_200/transition_matrices js/pocketsphinx.js:1
INFO: acmod.c(124): Attempting to use SCHMM computation module js/pocketsphinx.js:1
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: rm1_200/means js/pocketsphinx.js:1
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size: js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x24 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x3 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: rm1_200/variances js/pocketsphinx.js:1
INFO: ms_gauden.c(292): 1 codebook, 4 feature, size: js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x24 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x3 js/pocketsphinx.js:1
INFO: ms_gauden.c(294): 256x12 js/pocketsphinx.js:1
INFO: ms_gauden.c(354): 0 variance values floored js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(904): Loading senones from dump file rm1_200/sendump js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(928): BEGIN FILE FORMAT DESCRIPTION js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(991): Rows: 256, Columns: 335 js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(1023): Using memory-mapped I/O for senones js/pocketsphinx.js:1
INFO: s2_semi_mgau.c(1294): Maximum top-N: 4 Top-N beams: 0 0 0 0 js/pocketsphinx.js:1
INFO: dict.c(320): Allocating 135415 * 20 bytes (2644 KiB) for word entries js/pocketsphinx.js:1
INFO: dict.c(333): Reading main dictionary: cmudict.hub4.06d.dict js/pocketsphinx.js:1
INFO: dict.c(213): Allocated 992 KiB for strings, 1633 KiB for phones js/pocketsphinx.js:1
INFO: dict.c(336): 131316 words read js/pocketsphinx.js:1
INFO: dict.c(342): Reading filler dictionary: rm1_200/noisedict js/pocketsphinx.js:1
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones js/pocketsphinx.js:1
INFO: dict.c(345): 3 words read js/pocketsphinx.js:1
INFO: dict2pid.c(396): Building PID tables for dictionary js/pocketsphinx.js:1
INFO: dict2pid.c(406): Allocating 45^3 * 2 bytes (177 KiB) for word-initial triphones js/pocketsphinx.js:1
INFO: dict2pid.c(132): Allocated 24480 bytes (23 KiB) for word-final triphones js/pocketsphinx.js:1
INFO: dict2pid.c(196): Allocated 24480 bytes (23 KiB) for single-phone word triphones js/pocketsphinx.js:1
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(196): ngrams 1=19794, 2=1377200, 3=3178194 js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(242): 19794 = LM.unigrams(+trailer) read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(288): 1377200 = LM.bigrams(+trailer) read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(314): 3178194 = LM.trigrams read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(339): 57155 = LM.prob2 entries read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(359): 10935 = LM.bo_wt2 entries read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(379): 34843 = LM.prob3 entries read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(407): 2690 = LM.tseg_base entries read js/pocketsphinx.js:1
INFO: ngram_model_dmp.c(463): 19794 = ascii word strings read js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(99): 799 unique initial diphones js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 77 single-phone words js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(186): Creating search tree js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 77 single-phone words js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 128 js/pocketsphinx.js:1
ERROR: "ngram_search_fwdtree.c", line 336: No word from the language model has pronunciation in the dictionary js/pocketsphinx.js:1
INFO: ngram_search_fwdtree.c(339): after: 0 root, 0 non-root channels, 3 single-phone words js/pocketsphinx.js:1
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
So it seems like the dictionary is in uppercase (which seems normal to me), but the SLM is in lowercase. I don't know whether there is a canonical dictionary for your SLM, but you could try to adjust the case of one of the files (such as tr '[:upper:]' '[:lower:]' < input.txt > output.txt
).
The tricky part is that:
- The phonemes must be uppercase, so if you convert the dictionary file to lowercase, the phonemes will be wrong
- your SLM is binary so you can't process it with text utilities.
So what I'd do is start from the .lm file, convert it to uppercase, then regenerate the binary version. It is explained there: http://cmusphinx.sourceforge.net/wiki/tutoriallm
sphinx_lm_convert -i model.lm -o model.dmp
@nshmyrev do you have more input?
All above is not recommended. For US English we distribute cmu07a.dic as part of pocketsphinx sources which you can use with en-us generic acoustic model:
https://github.com/cmusphinx/pocketsphinx/blob/master/model/lm/en_US/cmu07a.dic
Thanks @nshmyrev that could be the final step to make it work. I had always used cmudict/cmudict.0.7a
from the subversion tree which is in uppercase.
@nshmyrev @syl22-00 I'm going to try that tonight! Will let you know how it goes :-)
All right then... I got it working!
The only drawback is the terrible recognition ;-) Is there a recommended set of files (LM + DIC) that I could load that is sort of 'proven' to work?
And what is an acoustic model? What does it do? And how could I create one for say, Dutch?
What I understand right now is:
- A dictionary defines a set of written words and their pronounciation (i.e. with a phonetic alphabet)
- A language model defines which words are likely to occur after one another
- An acoustic model eh... defines which phonetics resemble each other? (guessing here...)
Congratulations for having it finally working!
I should have said it earlier, but the acoustic model provided in the repository is very small (200 senones, very few parameters, so not very accurate), and built on the RM1 corpus which is small. So if you want to make something that has good performance you should use a better one. @nshmyrev will probably know much better, but I guess you should try one of these: http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/
Using them is similar to what you did for the dictionary and language model. The main difference is that one acoustic model has a bunch of files. You can place them inside one folder and give the folder name as parameter with -hmm
, or give the individual files as parameters (-mdef
, -variances
, etc.).
Also I would say your understanding is quite correct. Just to clarify dictionary and acoustic model: There is a set of phonemes (40-something for English), the dictionary maps words to phoneme sequences, and the acoustic model describes the statistical distributions of the way people speak these phonemes.
So if you want to recognize Dutch, you'll need dictionary, language and acoustic models for Dutch. I believe some already exist (http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Dutch%20Voxforge/), or you can train them yourself. There is abundant documentation on how to do that on http://cmusphinx.org.
Thanks :-)
Quick iteration to verify that I understand correctly:
- a waveform is translated into senones, based on the acoustic model (which describes ao how phones form certain senones in different contexts and how (from a waveform and context) it can be distilled which senone was likely to be 'meant')
- senone sequences are mapped to words using the dictionary, whereby the search for possible words is limited (and thus quickened) using the statistical language model
A restricted acoustic model will cause the recognizer to make mistakes in mapping my actual speech (phones) to the right senones, and thus, the wrong words will be presented on the screen. Correct?
Let's see if I can get the -hmm option to work..!
Oh, and am is it correct that the pocketsphinx.js file already includes a acoustic model? Can I somehow get a 'raw' pocketsphinx.js file without an embedded acoustic model?
Hum, not really, you seem to be confusing phonemes and senones. Senones basically just refers to the number of parameters in your model. Since there is a lot of variation in the way people pronounce one phoneme, you need a lot of parameters to capture the distributions, but you also need a lot of data to learn all these parameters.
But in short, you should have as much data as possible to train you acoustic model, and you need many senones to to take advantage of that.
I can't get the acoustic model to load.
Whatever I try giving to the init function, it will get back to me saying that
ERROR: "acmod.c", line 90: Folder 'model-en-us/acoustic' does not contain acoustic model definition 'mdef'
What I try at the moment is
var initRecognizer = function() {
// You can pass parameters to the recognizer, such as : {command: 'initialize', data: [["-hmm", "my_model"], ["-fwdflat", "no"]]}
postRecognizerJob({command: 'load', data: ["../../model-en-us/dict.js", "../../model-en-us/lm.js"]}, function() {
postRecognizerJob({command: 'initialize', data: [["-lm", "cmusphinx-5.0-en-us.lm.dmp"], ["-dict", "cmu07a.dic"], ["-hmm", "./model-en-us/acoustic"]]}, function() {
// Done loading!
recognizerReady();
});
});
};
I was thinking I might need to convert all these files (mdef, variances etc) to Javascript files with the file packager, but then again, that would load them in the 'virtual disk' where the init function can read files from, but how would I then specify a folder to the -hmm parameter?
I guess that should be quite clear from the docs: https://github.com/syl22-00/pocketsphinx.js#ii-package-model-files-outside-the-main-javascript
In the case of that example, you'd have "-hmm", "hub4wsj_sc_8k"
.
(Yes I was confused. I thought that a senon was a diphone or triphone, i.e. multiple phones together. I tried googling for 'senone' but there is not a single simple explanation. The one the CMU site is pretty vague... What you are saying is that a phone(me) is the smallest significant linguistic unit, and a senone describes the likeliness that a certain part of speech is a certain phoneme, based on its features and on its context (including phonemes to the left and the right). The acoustic model contains a set of senones, split by mdef, variances etc files. The more senones, the better speech can be mapped to phonemes. Right? Right?)
Imho that is not really clear from the docs (although I love the extensive docs! :-)), since the docs don't describe what exactly the file packager is doing, how it creates a virtual file structure, and how you would use that virtual file structure in your init function to load files.
It thus does not explain how to generate a virtual 'folder' that you can then use in the init function. But I think I figured it out:
It appears to be impossible to generate .js files in a 'virtual folder' from within the actual folder where the source files are stored. I have to go one folder up, thén run the file packager commands, point to the files ín the folder, and then the folder structure is preserved in the generated .js files.
I will generate all the .js files, load them (hope the browser doesn't crash), init them, and report back here!
That's what --embed
is for: If you run the packager with --embed hub4wsj_sc_8k/variances
, the generated JavaScript file will create a virtual variances
file inside a virtual hub4wsj_sc_8k/
folder.
Exactly, but, as far as I can see, --embed
also functions as the pointer to the file that you want to embed, and thus, you cannot call the file packager from within the source directory itself, but only from one directory up (since you need to point 'into' the directory, which is only possible from the outside).
@willemmulder Hello. I want to know if these steps worked out. Would you mind if you can open this source code for me? Then, I want to compare my work to this example. Thank you! =)
Hi @yjc0703 I have been searching, but can't find it back... We didn't go for this solution in the end also, so probably it got lost somewhere. I'm sorry!
@willemmulder oh I see. Thank you for your kindness reply. Thank you!
hai,I am learning CMUSphinx,but I am in trouble:
ERROR: "ngram_search_fwdtree.c", line 336: No word from the language model has pronunciation in the dictionary