pocketsphinx-python icon indicating copy to clipboard operation
pocketsphinx-python copied to clipboard

LiveSpeech - Specify Language

Open jhoelzl opened this issue 9 years ago • 19 comments

Hello,

i want to build a simple offline hotword detection and tried your example script:

from pocketsphinx import LiveSpeech

speech = LiveSpeech(lm=False, keyphrase='forward', kws_threshold=1e+20)
for phrase in speech:
     print(phrase.segments(detailed=True))

It works, but which language model is used for this (because lm=False)? How can i achieve to run the hotword detection in German language, such as:

speech = LiveSpeech(lm=False, keyphrase='guten tag', kws_threshold=1e+20)

jhoelzl avatar Nov 30 '16 10:11 jhoelzl

You need to add hmm='de-acoustic-model-path', dict='de-dictionary'

nshmyrev avatar Nov 30 '16 10:11 nshmyrev

okay, thanks, should i also set the correct model path for parameter lm, or remain it to lm=False?

jhoelzl avatar Nov 30 '16 11:11 jhoelzl

lm value should be False

nshmyrev avatar Nov 30 '16 11:11 nshmyrev

I'm trying to use this example instead:

import os from pocketsphinx import LiveSpeech, get_model_path `model_path = get_model_path()` speech = LiveSpeech( verbose=False, sampling_rate=16000, buffer_size=2048, no_search=False, full_utt=False, hmm=os.path.join(model_path, 'en-us'), lm=os.path.join(model_path, 'en-us.lm.bin'), dic=os.path.join(model_path, 'cmudict-en-us.dict') ) `for phrase in speech:` ` print(phrase)

but I want to try it (if possible) in Italian language. If I try it-acoustic-model-path and it-dictionary it does not work. And it does not work with "de" too. Should I download something?

DeveloperITA avatar Dec 04 '16 13:12 DeveloperITA

You need to explain what do you mean by "does not work". Does it print something in the output or what? You need to be more specific. Verbose probably should be removed.

nshmyrev avatar Dec 05 '16 18:12 nshmyrev

The output is

C:\Python27\python.exe C:/Users/Lorenzo/PycharmProjects/charlie/main.py Traceback (most recent call last): File "C:/Users/Lorenzo/PycharmProjects/charlie/main.py", line 14, in dic=os.path.join(model_path, 'cmudict-it-it.dict') File "C:\Python27\lib\site-packages\pocketsphinx_init_.py", line 208, in init super(LiveSpeech, self).init(**kwargs) File "C:\Python27\lib\site-packages\pocketsphinx_init_.py", line 90, in init super(Pocketsphinx, self).init(config) File "C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py", line 277, in init this = _pocketsphinx.new_Decoder(*args) RuntimeError: new_Decoder returned -1 Allocating 32 buffers of 2500 samples each

Process finished with exit code 1

DeveloperITA avatar Jan 01 '17 15:01 DeveloperITA

You need to remove verbose='False' from the code and post complete output. The current output says you did not specify a path to the files properly.

nshmyrev avatar Jan 02 '17 11:01 nshmyrev

The output still looks the same

C:\Python27\python.exe C:/Users/Lorenzo/PycharmProjects/charlie/main.py Traceback (most recent call last): File "C:/Users/Lorenzo/PycharmProjects/charlie/main.py", line 13, in dic=os.path.join(model_path, 'cmudict-it-it.dict') File "C:\Python27\lib\site-packages\pocketsphinx_init_.py", line 208, in init super(LiveSpeech, self).init(**kwargs) File "C:\Python27\lib\site-packages\pocketsphinx_init_.py", line 90, in init super(Pocketsphinx, self).init(config) File "C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py", line 277, in init this = _pocketsphinx.new_Decoder(*args) RuntimeError: new_Decoder returned -1 Allocating 32 buffers of 2500 samples each

Process finished with exit code 1

DeveloperITA avatar Jan 02 '17 11:01 DeveloperITA

Put verbose=True then.

nshmyrev avatar Jan 02 '17 11:01 nshmyrev

C:\Python27\python.exe C:/Users/Lorenzo/PycharmProjects/charlie/main.py Traceback (most recent call last): File "C:/Users/Lorenzo/PycharmProjects/charlie/main.py", line 14, in dic=os.path.join(model_path, 'cmudict-it-it.dict') File "C:\Python27\lib\site-packages\pocketsphinx_init_.py", line 208, in init super(LiveSpeech, self).init(**kwargs) File "C:\Python27\lib\site-packages\pocketsphinx_init_.py", line 90, in init super(Pocketsphinx, self).init(config) File "C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py", line 277, in init this = _pocketsphinx.new_Decoder(*args) RuntimeError: new_Decoder returned -1 Current configuration: [NAME] [DEFLT] [VALUE] -agc none none -agcthresh 2.0 2.000000e+00 -allphone -allphone_ci no no -alpha 0.97 9.700000e-01 -ascale 20.0 2.000000e+01 -aw 1 1 -backtrace no no -beam 1e-48 1.000000e-48 -bestpath yes yes -bestpathlw 9.5 9.500000e+00 -ceplen 13 13 -cmn live live -cmninit 40,3,-1 40,3,-1 -compallsen no no -debug 0 -dict C:\Python27\lib\site-packages\pocketsphinx\model\cmudict-it-it.dict -dictcase no no -dither no no -doublebw no no -ds 1 1 -fdict -feat 1s_c_d_dd 1s_c_d_dd -featparams -fillprob 1e-8 1.000000e-08 -frate 100 100 -fsg -fsgusealtpron yes yes Allocating 32 buffers of 2500 samples each -fsgusefiller yes yes -fwdflat yes yes -fwdflatbeam 1e-64 1.000000e-64 -fwdflatefwid 4 4 -fwdflatlw 8.5 8.500000e+00 -fwdflatsfwin 25 25 -fwdflatwbeam 7e-29 7.000000e-29 -fwdtree yes yes -hmm C:\Python27\lib\site-packages\pocketsphinx\model\it-it -input_endian little little -jsgf -keyphrase -kws -kws_delay 10 10 -kws_plp 1e-1 1.000000e-01 -kws_threshold 1 1.000000e+00 -latsize 5000 5000 -lda -ldadim 0 0 -lifter 0 0 -lm C:\Python27\lib\site-packages\pocketsphinx\model\it-it.lm.bin -lmctl -lmname -logbase 1.0001 1.000100e+00 -logfn -logspec no no -lowerf 133.33334 1.333333e+02 -lpbeam 1e-40 1.000000e-40 -lponlybeam 7e-29 7.000000e-29 -lw 6.5 6.500000e+00 -maxhmmpf 30000 30000 -maxwpf -1 -1 -mdef -mean -mfclogdir -min_endfr 0 0 -mixw -mixwfloor 0.0000001 1.000000e-07 -mllr -mmap yes yes -ncep 13 13 -nfft 512 512 -nfilt 40 40 -nwpen 1.0 1.000000e+00 -pbeam 1e-48 1.000000e-48 -pip 1.0 1.000000e+00 -pl_beam 1e-10 1.000000e-10 -pl_pbeam 1e-10 1.000000e-10 -pl_pip 1.0 1.000000e+00 -pl_weight 3.0 3.000000e+00 -pl_window 5 5 -rawlogdir -remove_dc no no -remove_noise yes yes -remove_silence yes yes -round_filters yes yes -samprate 16000 1.600000e+04 -seed -1 -1 -sendump -senlogdir -senmgau -silprob 0.005 5.000000e-03 -smoothspec no no -svspec -tmat -tmatfloor 0.0001 1.000000e-04 -topn 4 4 -topn_beam 0 0 -toprule -transform legacy legacy -unit_area yes yes -upperf 6855.4976 6.855498e+03 -uw 1.0 1.000000e+00 -vad_postspeech 50 50 -vad_prespeech 20 20 -vad_startspeech 10 10 -vad_threshold 2.0 2.000000e+00 -var -varfloor 0.0001 1.000000e-04 -varnorm no no -verbose no no -warp_params -warp_type inverse_linear inverse_linear -wbeam 7e-29 7.000000e-29 -wip 0.65 6.500000e-01 -wlen 0.025625 2.562500e-02

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='live', VARNORM='no', AGC='none' ERROR: "acmod.c", line 83: Folder 'C:\Python27\lib\site-packages\pocketsphinx\model\it-it' does not contain acoustic model definition 'mdef'

Process finished with exit code 1

DeveloperITA avatar Jan 02 '17 11:01 DeveloperITA

This error says the model is missing:

ERROR: "acmod.c", line 83: Folder 'C:\Python27\lib\site-packages\pocketsphinx\model\it-it' does not contain acoustic model definition 'mdef'

nshmyrev avatar Jan 02 '17 12:01 nshmyrev

What do I need to do?

DeveloperITA avatar Jan 02 '17 13:01 DeveloperITA

Put the model in the folder

nshmyrev avatar Jan 02 '17 13:01 nshmyrev

Where can I download it? I didn't find them

DeveloperITA avatar Jan 02 '17 13:01 DeveloperITA

https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Italian/cmusphinx-it-5.2.tar.gz/download

nshmyrev avatar Jan 02 '17 13:01 nshmyrev

thank you really much!

DeveloperITA avatar Jan 02 '17 13:01 DeveloperITA

In the zip folder I can't find the it-it.lm.bin file and the cmudict-it-it.dict file D:

DeveloperITA avatar Jan 02 '17 13:01 DeveloperITA

@nshmyrev hi, i try example script as follow, but it doesn't work. Seemingly, it can not get data from microphone. But, i can get data from microphone using pyAudio. So, I am confused.

`import os from pocketsphinx import LiveSpeech, get_model_path

model_path = get_model_path()

speech = LiveSpeech( verbose=True, sampling_rate=16000, buffer_size=2048, no_search=False, full_utt=False, hmm=os.path.join(model_path, 'en-us'), lm=os.path.join(model_path, 'en-us.lm.bin'), dic=os.path.join(model_path, 'cmudict-en-us.dict') ) for phrase in speech: print("phrase:", phrase) print(phrase.segments(detailed=True))`

Verbose as follow: `INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/en-us/feat.params Current configuration: [NAME] [DEFLT] [VALUE] -agc none none -agcthresh 2.0 2.000000e+00 -allphone -allphone_ci yes yes -alpha 0.97 9.700000e-01 -ascale 20.0 2.000000e+01 -aw 1 1 -backtrace no no -beam 1e-48 1.000000e-48 -bestpath yes yes -bestpathlw 9.5 9.500000e+00 -ceplen 13 13 -cmn live batch -cmninit 40,3,-1 41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17 -compallsen no no -dict /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/cmudict-en-us.dict -dictcase no no -dither no no -doublebw no no -ds 1 1 -fdict -feat 1s_c_d_dd 1s_c_d_dd -featparams -fillprob 1e-8 1.000000e-08 -frate 100 100 -fsg -fsgusealtpron yes yes -fsgusefiller yes yes -fwdflat yes yes -fwdflatbeam 1e-64 1.000000e-64 -fwdflatefwid 4 4 -fwdflatlw 8.5 8.500000e+00 -fwdflatsfwin 25 25 -fwdflatwbeam 7e-29 7.000000e-29 -fwdtree yes yes -hmm /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/en-us -input_endian little little -jsgf -keyphrase -kws -kws_delay 10 10 -kws_plp 1e-1 1.000000e-01 -kws_threshold 1e-30 1.000000e-30 -latsize 5000 5000 -lda -ldadim 0 0 -lifter 0 22 -lm /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/en-us.lm.bin -lmctl -lmname -logbase 1.0001 1.000100e+00 -logfn -logspec no no -lowerf 133.33334 1.300000e+02 -lpbeam 1e-40 1.000000e-40 -lponlybeam 7e-29 7.000000e-29 -lw 6.5 6.500000e+00 -maxhmmpf 30000 30000 -maxwpf -1 -1 -mdef -mean -mfclogdir -min_endfr 0 0 -mixw -mixwfloor 0.0000001 1.000000e-07 -mllr -mmap yes yes -ncep 13 13 -nfft 512 512 -nfilt 40 25 -nwpen 1.0 1.000000e+00 -pbeam 1e-48 1.000000e-48 -pip 1.0 1.000000e+00 -pl_beam 1e-10 1.000000e-10 -pl_pbeam 1e-10 1.000000e-10 -pl_pip 1.0 1.000000e+00 -pl_weight 3.0 3.000000e+00 -pl_window 5 5 -rawlogdir -remove_dc no no -remove_noise yes yes -remove_silence yes yes -round_filters yes yes -samprate 16000 1.600000e+04 -seed -1 -1 -sendump -senlogdir -senmgau -silprob 0.005 5.000000e-03 -smoothspec no no -svspec 0-12/13-25/26-38 -tmat -tmatfloor 0.0001 1.000000e-04 -topn 4 4 -topn_beam 0 0 -toprule -transform legacy dct -unit_area yes yes -upperf 6855.4976 6.800000e+03 -uw 1.0 1.000000e+00 -vad_postspeech 50 50 -vad_prespeech 20 20 -vad_startspeech 10 10 -vad_threshold 3.0 3.000000e+00 -var -varfloor 0.0001 1.000000e-04 -varnorm no no -verbose no no -warp_params -warp_type inverse_linear inverse_linear -wbeam 7e-29 7.000000e-29 -wip 0.65 6.500000e-01 -wlen 0.025625 2.562500e-02

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none' INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38 INFO: mdef.c(518): Reading model definition: /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/en-us/mdef INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/en-us/mdef INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq INFO: tmat.c(149): Reading HMM transition probability matrices: /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/en-us/transition_matrices INFO: acmod.c(113): Attempting to use PTM computation module INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/en-us/means INFO: ms_gauden.c(242): 42 codebook, 3 feature, size: INFO: ms_gauden.c(244): 128x13 INFO: ms_gauden.c(244): 128x13 INFO: ms_gauden.c(244): 128x13 INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/en-us/variances INFO: ms_gauden.c(242): 42 codebook, 3 feature, size: INFO: ms_gauden.c(244): 128x13 INFO: ms_gauden.c(244): 128x13 INFO: ms_gauden.c(244): 128x13 INFO: ms_gauden.c(304): 222 variance values floored INFO: ptm_mgau.c(475): Loading senones from dump file /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/en-us/sendump INFO: ptm_mgau.c(499): BEGIN FILE FORMAT DESCRIPTION INFO: ptm_mgau.c(562): Rows: 128, Columns: 5126 INFO: ptm_mgau.c(594): Using memory-mapped I/O for senones INFO: ptm_mgau.c(837): Maximum top-N: 4 INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0 INFO: dict.c(320): Allocating 138824 * 32 bytes (4338 KiB) for word entries INFO: dict.c(333): Reading main dictionary: /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/cmudict-en-us.dict INFO: dict.c(213): Dictionary size 134723, allocated 1016 KiB for strings, 1679 KiB for phones INFO: dict.c(336): 134723 words read INFO: dict.c(358): Reading filler dictionary: /usr/local/var/pyenv/versions/3.6.3/lib/python3.6/site-packages/pocketsphinx/model/en-us/noisedict INFO: dict.c(213): Dictionary size 134728, allocated 0 KiB for strings, 0 KiB for phones INFO: dict.c(361): 5 words read INFO: dict2pid.c(396): Building PID tables for dictionary INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones INFO: dict2pid.c(132): Allocated 42672 bytes (41 KiB) for word-final triphones INFO: dict2pid.c(196): Allocated 42672 bytes (41 KiB) for single-phone word triphones INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format INFO: ngram_search_fwdtree.c(74): Initializing search tree INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones INFO: ngram_search_fwdtree.c(186): Creating search channels INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609 INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25`

ThankU!

jdHehe avatar Apr 08 '19 09:04 jdHehe

Your problem is not related to the topic of the issue. I recommend you to start a new issue and delete this post.

nshmyrev avatar Apr 08 '19 10:04 nshmyrev