pocketsphinx-python incorrect time coordinates of each word

incorrect time coordinates of each word

Open alemol opened this issue 5 years ago • 0 comments

There is a bug getting utterances.

I want to get the time coordinates of each word in my 'audio.wav' using python pocketsphinx 0.1.15. I reproduce the official example code from the project https://pypi.org/project/pocketsphinx/ which works well for 'goforward.raw':

# ----------------------------
# | start |  end  |   word   |
# ----------------------------
# |  0.0s | 0.24s | <s>      |
# | 0.25s | 0.45s | <sil>    |
# | 0.46s | 0.63s | go       |
# | 0.64s | 1.16s | forward  |
# | 1.17s | 1.52s | ten      |
# | 1.53s | 2.11s | meters   |
# | 2.12s |  2.6s | </s>     |
# ----------------------------

When i use my 'audio.wav' the output of ps.segments(detailed=True) is not so bad but when using AudioFile classe (as in the official example) the result is very inaccurate. Not even close to be correct in time coordinates (since the audio is 2.52 sec.) nor in the number of segments.

What is wrong? What should i do to have correct time coordinates?

rate 16000 frames 40371 2.5231875

[('<s>', 1, 84, 91), ('que', 1, 92, 166),
 ('<sil>', -1443, 167, 169), ('la', -355, 170, 180),
 ('voz', -323, 181, 201), ('del', -3028, 202, 216),
 ('postulante', 0, 217, 279), ('en', -1, 280, 323),
 ('</s>', 0, 324, 327)]
----------------------------
| start |  end  |   word   |
----------------------------
| 0.07s | 0.14s |      <s> |
| 0.15s | 0.79s |      que |
|  0.8s | 0.82s |     </s> |
----------------------------

Here is my python code:

import os.path
# This is just to have audio info
import wave
import contextlib

from pocketsphinx import (Pocketsphinx, AudioFile, LiveSpeech)
# my own ps model an other resources
from utils.utilities import (get_mexconf, get_data_path)

# get the file and print audio properties
wav = os.path.join(get_data_path(), 'audio.wav')
with contextlib.closing(wave.open(wav,'r')) as f:
   rate = f.getframerate()
   frames = f.getnframes()
   duration = frames / float(rate)
   print('rate', rate, 'frames', frames, 'duration', duration)

# This part seems to work getting segments 
segments = get_segments(wav)
print(segments)

# set up my asr models and my audio
config = get_mexconf()
config['audio_file'] = wav
audio = AudioFile(**config)

# This part is copy paste from official example #
# Frames per Second
fps = 100
config['frate'] = fps

for phrase in audio:
    print('-' * 28)
    print('| %5s |  %3s  |   %4s   |' % ('start', 'end', 'word'))
    print('-' * 28)
    for s in phrase.seg():
        print('| %4ss | %4ss | %8s |' % (s.start_frame / fps, s.end_frame / fps, s.word))
    print('-' * 28)

This is the config:

config = {
    'hmm': os.path.join(model_path, 'LKE_T29.cd_cont_6000'),
    'lm': os.path.join(model_path, 'LKE_T29.lm.bin'),
    'dict': os.path.join(model_path, 'LKE_T29.dic'),
    'verbose': True,
    'backtrace' : True
}

Causes this output:

/home/amolina/repo/audiotranscriptor/data/audio.wav
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/feat.params
Current configuration:
[NAME]          [DEFLT]     [VALUE]
-agc            none        none
-agcthresh      2.0     2.000000e+00
-allphone               
-allphone_ci        yes     yes
-alpha          0.97        9.700000e-01
-ascale         20.0        2.000000e+01
-aw         1       1
-backtrace      no      yes
-beam           1e-48       1.000000e-48
-bestpath       yes     yes
-bestpathlw     9.5     9.500000e+00
-ceplen         13      13
-cmn            live        batch
-cmninit        40,3,-1     40,3,-1
-compallsen     no      no
-dict                   /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.dic
-dictcase       no      no
-dither         no      no
-doublebw       no      no
-ds         1       1
-fdict                  
-feat           1s_c_d_dd   1s_c_d_dd
-featparams             
-fillprob       1e-8        1.000000e-08
-frate          100     100
-fsg                    
-fsgusealtpron      yes     yes
-fsgusefiller       yes     yes
-fwdflat        yes     yes
-fwdflatbeam        1e-64       1.000000e-64
-fwdflatefwid       4       4
-fwdflatlw      8.5     8.500000e+00
-fwdflatsfwin       25      25
-fwdflatwbeam       7e-29       7.000000e-29
-fwdtree        yes     yes
-hmm                    /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000
-input_endian       little      little
-jsgf                   
-keyphrase              
-kws                    
-kws_delay      10      10
-kws_plp        1e-1        1.000000e-01
-kws_threshold      1e-30       1.000000e-30
-latsize        5000        5000
-lda                    
-ldadim         0       0
-lifter         0       22
-lm                 /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.lm.bin
-lmctl                  
-lmname                 
-logbase        1.0001      1.000100e+00
-logfn                  
-logspec        no      no
-lowerf         133.33334   1.300000e+02
-lpbeam         1e-40       1.000000e-40
-lponlybeam     7e-29       7.000000e-29
-lw         6.5     6.500000e+00
-maxhmmpf       30000       30000
-maxwpf         -1      -1
-mdef                   
-mean                   
-mfclogdir              
-min_endfr      0       0
-mixw                   
-mixwfloor      0.0000001   1.000000e-07
-mllr                   
-mmap           yes     yes
-ncep           13      13
-nfft           512     512
-nfilt          40      25
-nwpen          1.0     1.000000e+00
-pbeam          1e-48       1.000000e-48
-pip            1.0     1.000000e+00
-pl_beam        1e-10       1.000000e-10
-pl_pbeam       1e-10       1.000000e-10
-pl_pip         1.0     1.000000e+00
-pl_weight      3.0     3.000000e+00
-pl_window      5       5
-rawlogdir              
-remove_dc      no      no
-remove_noise       yes     yes
-remove_silence     yes     yes
-round_filters      yes     yes
-samprate       16000       1.600000e+04
-seed           -1      -1
-sendump                
-senlogdir              
-senmgau                
-silprob        0.005       5.000000e-03
-smoothspec     no      no
-svspec                 
-tmat                   
-tmatfloor      0.0001      1.000000e-04
-topn           4       4
-topn_beam      0       0
-toprule                
-transform      legacy      dct
-unit_area      yes     yes
-upperf         6855.4976   6.800000e+03
-uw         1.0     1.000000e+00
-vad_postspeech     50      50
-vad_prespeech      20      20
-vad_startspeech    10      10
-vad_threshold      3.0     3.000000e+00
-var                    
-varfloor       0.0001      1.000000e-04
-varnorm        no      no
-verbose        no      no
-warp_params                
-warp_type      inverse_linear  inverse_linear
-wbeam          7e-29       7.000000e-29
-wip            0.65        6.500000e-01
-wlen           0.025625    2.562500e-02

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: mdef.c(518): Reading model definition: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/mdef
INFO: bin_mdef.c(181): Allocating 79833 * 8 bytes (623 KiB) for CD tree
INFO: tmat.c(149): Reading HMM transition probability matrices: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: ptm_mgau.c(803): Number of codebooks exceeds 256: 6090
INFO: acmod.c(115): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: acmod.c(117): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: ms_senone.c(149): Reading senone mixture weights: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/mixture_weights
INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(207): Not transposing mixture weights in memory
INFO: ms_senone.c(268): Read mixture weights for 6090 senones: 1 features x 32 codewords
INFO: ms_senone.c(320): Mapping senones to individual codebooks
INFO: ms_mgau.c(144): The value of topn: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 270112 * 32 bytes (8441 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.dic
INFO: dict.c(213): Dictionary size 266013, allocated 2217 KiB for strings, 4260 KiB for phones
INFO: dict.c(336): 266013 words read
INFO: dict.c(358): Reading filler dictionary: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/noisedict
INFO: dict.c(213): Dictionary size 266016, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 30^3 * 2 bytes (52 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21840 bytes (21 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21840 bytes (21 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 675 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 580308
INFO: ngram_search_fwdtree.c(333): Created 675 root, 580180 non-root channels, 75 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn_live.c(120): Update from < 40.00  3.00 -1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
INFO: cmn_live.c(138): Update to   < 48.64  8.26  9.46 15.53  0.73 10.14 -16.38 -7.88  2.97 -27.40 19.44 -3.48 -3.37 >
INFO: ngram_search_fwdtree.c(1550):     2069 words recognized (8/fr)
INFO: ngram_search_fwdtree.c(1552):   660566 senones evaluated (2696/fr)
INFO: ngram_search_fwdtree.c(1556):  3277888 channels searched (13379/fr), 106535 1st, 54408 last
INFO: ngram_search_fwdtree.c(1559):     4296 words for which last channels evaluated (17/fr)
INFO: ngram_search_fwdtree.c(1561):   254434 candidate words for entering last phone (1038/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 2.07 CPU 0.844 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 2.07 wall 0.844 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 118 words
INFO: ngram_search_fwdflat.c(948):     1097 words recognized (4/fr)
INFO: ngram_search_fwdflat.c(950):    95729 senones evaluated (391/fr)
INFO: ngram_search_fwdflat.c(952):    72888 channels searched (297/fr)
INFO: ngram_search_fwdflat.c(954):     6586 words searched (26/fr)
INFO: ngram_search_fwdflat.c(957):     6611 word transitions (26/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.18 CPU 0.074 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.18 wall 0.074 xRT
INFO: ngram_search.c(1250): lattice start node <s>.0 end node </s>.240
INFO: ngram_search.c(1276): Eliminated 0 nodes before end node
INFO: ngram_search.c(1381): Lattice has 228 nodes, 280 links
INFO: ps_lattice.c(1374): Bestpath score: -8058
INFO: ps_lattice.c(1378): Normalizer P(O) = alpha(</s>:240:243) = -661594
INFO: ps_lattice.c(1435): Joint P(O,S) = -678760 P(S|O) = -17166
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.000 xRT
INFO: pocketsphinx.c(1170): que la voz del postulante en (-8196)
word                 start end   pprob ascr       lscr       lback
INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
<s>                  84    91    1.000 -306176    0          1  
que                  92    166   1.000 -1845248   -223       2  
<sil>                167   169   0.866 -306176    -524288    2  
la                   170   180   0.965 -253952    -320       1  
voz                  181   201   0.968 -272384    -406       2  
del                  202   216   0.739 -475136    -185       3  
postulante           217   279   1.000 -1587200   -168       3  
en                   280   323   1.000 -807936    -170       2  
</s>                 324   327   1.000 -807936    -189       2  
INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
[('<s>', 1, 84, 91), ('que', 1, 92, 166), ('<sil>', -1443, 167, 169), ('la', -355, 170, 180), ('voz', -323, 181, 201), ('del', -3028, 202, 216), ('postulante', 0, 217, 279), ('en', -1, 280, 323), ('</s>', 0, 324, 327)]
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/feat.params
Current configuration:
[NAME]          [DEFLT]     [VALUE]
-agc            none        none
-agcthresh      2.0     2.000000e+00
-allphone               
-allphone_ci        yes     yes
-alpha          0.97        9.700000e-01
-ascale         20.0        2.000000e+01
-aw         1       1
-backtrace      no      yes
-beam           1e-48       1.000000e-48
-bestpath       yes     yes
-bestpathlw     9.5     9.500000e+00
-ceplen         13      13
-cmn            live        batch
-cmninit        40,3,-1     40,3,-1
-compallsen     no      no
-dict                   /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.dic
-dictcase       no      no
-dither         no      no
-doublebw       no      no
-ds         1       1
-fdict                  
-feat           1s_c_d_dd   1s_c_d_dd
-featparams             
-fillprob       1e-8        1.000000e-08
-frate          100     100
-fsg                    
-fsgusealtpron      yes     yes
-fsgusefiller       yes     yes
-fwdflat        yes     yes
-fwdflatbeam        1e-64       1.000000e-64
-fwdflatefwid       4       4
-fwdflatlw      8.5     8.500000e+00
-fwdflatsfwin       25      25
-fwdflatwbeam       7e-29       7.000000e-29
-fwdtree        yes     yes
-hmm                    /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000
-input_endian       little      little
-jsgf                   
-keyphrase              
-kws                    
-kws_delay      10      10
-kws_plp        1e-1        1.000000e-01
-kws_threshold      1e-30       1.000000e-30
-latsize        5000        5000
-lda                    
-ldadim         0       0
-lifter         0       22
-lm                 /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.lm.bin
-lmctl                  
-lmname                 
-logbase        1.0001      1.000100e+00
-logfn                  
-logspec        no      no
-lowerf         133.33334   1.300000e+02
-lpbeam         1e-40       1.000000e-40
-lponlybeam     7e-29       7.000000e-29
-lw         6.5     6.500000e+00
-maxhmmpf       30000       30000
-maxwpf         -1      -1
-mdef                   
-mean                   
-mfclogdir              
-min_endfr      0       0
-mixw                   
-mixwfloor      0.0000001   1.000000e-07
-mllr                   
-mmap           yes     yes
-ncep           13      13
-nfft           512     512
-nfilt          40      25
-nwpen          1.0     1.000000e+00
-pbeam          1e-48       1.000000e-48
-pip            1.0     1.000000e+00
-pl_beam        1e-10       1.000000e-10
-pl_pbeam       1e-10       1.000000e-10
-pl_pip         1.0     1.000000e+00
-pl_weight      3.0     3.000000e+00
-pl_window      5       5
-rawlogdir              
-remove_dc      no      no
-remove_noise       yes     yes
-remove_silence     yes     yes
-round_filters      yes     yes
-samprate       16000       1.600000e+04
-seed           -1      -1
-sendump                
-senlogdir              
-senmgau                
-silprob        0.005       5.000000e-03
-smoothspec     no      no
-svspec                 
-tmat                   
-tmatfloor      0.0001      1.000000e-04
-topn           4       4
-topn_beam      0       0
-toprule                
-transform      legacy      dct
-unit_area      yes     yes
-upperf         6855.4976   6.800000e+03
-uw         1.0     1.000000e+00
-vad_postspeech     50      50
-vad_prespeech      20      20
-vad_startspeech    10      10
-vad_threshold      3.0     3.000000e+00
-var                    
-varfloor       0.0001      1.000000e-04
-varnorm        no      no
-verbose        no      no
-warp_params                
-warp_type      inverse_linear  inverse_linear
-wbeam          7e-29       7.000000e-29
-wip            0.65        6.500000e-01
-wlen           0.025625    2.562500e-02

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: mdef.c(518): Reading model definition: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/mdef
INFO: bin_mdef.c(181): Allocating 79833 * 8 bytes (623 KiB) for CD tree
INFO: tmat.c(149): Reading HMM transition probability matrices: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: ptm_mgau.c(803): Number of codebooks exceeds 256: 6090
INFO: acmod.c(115): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: acmod.c(117): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size: 
INFO: ms_gauden.c(244):  32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: ms_senone.c(149): Reading senone mixture weights: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/mixture_weights
INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(207): Not transposing mixture weights in memory
INFO: ms_senone.c(268): Read mixture weights for 6090 senones: 1 features x 32 codewords
INFO: ms_senone.c(320): Mapping senones to individual codebooks
INFO: ms_mgau.c(144): The value of topn: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 270112 * 32 bytes (8441 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.dic
INFO: dict.c(213): Dictionary size 266013, allocated 2217 KiB for strings, 4260 KiB for phones
INFO: dict.c(336): 266013 words read
INFO: dict.c(358): Reading filler dictionary: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/noisedict
INFO: dict.c(213): Dictionary size 266016, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 30^3 * 2 bytes (52 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21840 bytes (21 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21840 bytes (21 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 675 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 580308
INFO: ngram_search_fwdtree.c(333): Created 675 root, 580180 non-root channels, 75 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn_live.c(120): Update from < 40.00  3.00 -1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
INFO: cmn_live.c(138): Update to   < 51.42  6.44  5.63 41.21  5.63  4.06 -30.89 -11.26 11.47 -38.96 22.43 -14.16  5.74 >
INFO: ngram_search_fwdtree.c(1550):      437 words recognized (6/fr)
INFO: ngram_search_fwdtree.c(1552):   136376 senones evaluated (1771/fr)
INFO: ngram_search_fwdtree.c(1556):   477102 channels searched (6196/fr), 27215 1st, 9054 last
INFO: ngram_search_fwdtree.c(1559):      902 words for which last channels evaluated (11/fr)
INFO: ngram_search_fwdtree.c(1561):    29178 candidate words for entering last phone (378/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.36 CPU 0.471 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.36 wall 0.471 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 13 words
INFO: ngram_search_fwdflat.c(948):      341 words recognized (4/fr)
INFO: ngram_search_fwdflat.c(950):    13931 senones evaluated (181/fr)
INFO: ngram_search_fwdflat.c(952):     9837 channels searched (127/fr)
INFO: ngram_search_fwdflat.c(954):      923 words searched (11/fr)
INFO: ngram_search_fwdflat.c(957):      395 word transitions (5/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.02 CPU 0.031 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.02 wall 0.031 xRT
INFO: ngram_search.c(1250): lattice start node <s>.0 end node </s>.73
INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
INFO: ngram_search.c(1381): Lattice has 48 nodes, 32 links
INFO: ps_lattice.c(1374): Bestpath score: -2486
INFO: ps_lattice.c(1378): Normalizer P(O) = alpha(</s>:73:75) = -148657
INFO: ps_lattice.c(1435): Joint P(O,S) = -155501 P(S|O) = -6844
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.000 xRT
INFO: pocketsphinx.c(1170): que (-2547)
word                 start end   pprob ascr       lscr       lback
INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
<s>                  7     14    1.000 -306176    0          1  
que                  15    79    1.000 -1390592   -223       2  
</s>                 80    82    1.000 -1390592   -345       3  
----------------------------
| start |  end  |   word   |
----------------------------
INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
| 0.07s | 0.14s |      <s> |
| 0.15s | 0.79s |      que |
|  0.8s | 0.82s |     </s> |
----------------------------
INFO: cmn_live.c(120): Update from < 51.42  6.44  5.63 41.21  5.63  4.06 -30.89 -11.26 11.47 -38.96 22.43 -14.16  5.74 >
INFO: cmn_live.c(138): Update to   < 48.82  8.28  9.59 15.19  0.65 10.14 -16.30 -7.60  2.97 -27.46 19.32 -3.37 -3.41 >
INFO: ngram_search_fwdtree.c(1550):     4085 words recognized (25/fr)
INFO: ngram_search_fwdtree.c(1552):   465826 senones evaluated (2806/fr)
INFO: ngram_search_fwdtree.c(1556):  1701760 channels searched (10251/fr), 73994 1st, 78396 last
INFO: ngram_search_fwdtree.c(1559):     6445 words for which last channels evaluated (38/fr)
INFO: ngram_search_fwdtree.c(1561):   107024 candidate words for entering last phone (644/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 1.25 CPU 0.755 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 1.25 wall 0.755 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 266 words
INFO: ngram_search_fwdflat.c(948):     1400 words recognized (8/fr)
INFO: ngram_search_fwdflat.c(950):   115214 senones evaluated (694/fr)
INFO: ngram_search_fwdflat.c(952):   114370 channels searched (688/fr)
INFO: ngram_search_fwdflat.c(954):    11749 words searched (70/fr)
INFO: ngram_search_fwdflat.c(957):    13504 word transitions (81/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.23 CPU 0.138 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.23 wall 0.138 xRT
INFO: ngram_search.c(1250): lattice start node <s>.0 end node </s>.162
INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
INFO: ngram_search.c(1381): Lattice has 185 nodes, 243 links
INFO: ps_lattice.c(1374): Bestpath score: -7587
INFO: ps_lattice.c(1378): Normalizer P(O) = alpha(</s>:162:164) = -572256
INFO: ps_lattice.c(1435): Joint P(O,S) = -645037 P(S|O) = -72781
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.000 xRT
INFO: pocketsphinx.c(1170): la voz del postulante no (-7682)
word                 start end   pprob ascr       lscr       lback
INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
<s>                  84    86    1.000 -133120    0          1  
la                   87    99    0.096 -362496    -257       2  
voz                  100   119   0.091 -572416    -404       3  
del                  120   132   0.015 -706560    -185       3  
<sil>                133   137   0.781 -460800    -524288    3  
postulante           138   199   1.000 -2325504   -871       1  
no                   200   245   0.501 -878592    -353       1  
</s>                 246   248   1.000 -878592    -192       2  
INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 1.62 CPU 0.670 xRT
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 1.62 wall 0.670 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.25 CPU 0.105 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.25 wall 0.105 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.000 xRT
INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 2.07 CPU 0.847 xRT
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 2.07 wall 0.847 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.18 CPU 0.074 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.18 wall 0.074 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.000 xRT

Nov 08 '18 19:11 alemol

pocketsphinx-python pocketsphinx-python copied to clipboard

incorrect time coordinates of each word

pocketsphinx-python
pocketsphinx-python copied to clipboard