pocketsphinx-python
pocketsphinx-python copied to clipboard
incorrect time coordinates of each word
There is a bug getting utterances.
Related stackoverflow Question
I want to get the time coordinates of each word in my 'audio.wav' using python pocketsphinx 0.1.15. I reproduce the official example code from the project https://pypi.org/project/pocketsphinx/ which works well for 'goforward.raw':
# ----------------------------
# | start | end | word |
# ----------------------------
# | 0.0s | 0.24s | <s> |
# | 0.25s | 0.45s | <sil> |
# | 0.46s | 0.63s | go |
# | 0.64s | 1.16s | forward |
# | 1.17s | 1.52s | ten |
# | 1.53s | 2.11s | meters |
# | 2.12s | 2.6s | </s> |
# ----------------------------
When i use my 'audio.wav' the output of ps.segments(detailed=True) is not so bad but when using AudioFile classe (as in the official example) the result is very inaccurate. Not even close to be correct in time coordinates (since the audio is 2.52 sec.) nor in the number of segments.
What is wrong? What should i do to have correct time coordinates?
rate 16000 frames 40371 2.5231875
[('<s>', 1, 84, 91), ('que', 1, 92, 166),
('<sil>', -1443, 167, 169), ('la', -355, 170, 180),
('voz', -323, 181, 201), ('del', -3028, 202, 216),
('postulante', 0, 217, 279), ('en', -1, 280, 323),
('</s>', 0, 324, 327)]
----------------------------
| start | end | word |
----------------------------
| 0.07s | 0.14s | <s> |
| 0.15s | 0.79s | que |
| 0.8s | 0.82s | </s> |
----------------------------
Here is my python code:
import os.path
# This is just to have audio info
import wave
import contextlib
from pocketsphinx import (Pocketsphinx, AudioFile, LiveSpeech)
# my own ps model an other resources
from utils.utilities import (get_mexconf, get_data_path)
# get the file and print audio properties
wav = os.path.join(get_data_path(), 'audio.wav')
with contextlib.closing(wave.open(wav,'r')) as f:
rate = f.getframerate()
frames = f.getnframes()
duration = frames / float(rate)
print('rate', rate, 'frames', frames, 'duration', duration)
# This part seems to work getting segments
segments = get_segments(wav)
print(segments)
# set up my asr models and my audio
config = get_mexconf()
config['audio_file'] = wav
audio = AudioFile(**config)
# This part is copy paste from official example #
# Frames per Second
fps = 100
config['frate'] = fps
for phrase in audio:
print('-' * 28)
print('| %5s | %3s | %4s |' % ('start', 'end', 'word'))
print('-' * 28)
for s in phrase.seg():
print('| %4ss | %4ss | %8s |' % (s.start_frame / fps, s.end_frame / fps, s.word))
print('-' * 28)
This is the config:
config = {
'hmm': os.path.join(model_path, 'LKE_T29.cd_cont_6000'),
'lm': os.path.join(model_path, 'LKE_T29.lm.bin'),
'dict': os.path.join(model_path, 'LKE_T29.dic'),
'verbose': True,
'backtrace' : True
}
Causes this output:
/home/amolina/repo/audiotranscriptor/data/audio.wav
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci yes yes
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no yes
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn live batch
-cmninit 40,3,-1 40,3,-1
-compallsen no no
-dict /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1e-30 1.000000e-30
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.lm.bin
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.300000e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 3.0 3.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: mdef.c(518): Reading model definition: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/mdef
INFO: bin_mdef.c(181): Allocating 79833 * 8 bytes (623 KiB) for CD tree
INFO: tmat.c(149): Reading HMM transition probability matrices: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: ptm_mgau.c(803): Number of codebooks exceeds 256: 6090
INFO: acmod.c(115): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: acmod.c(117): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: ms_senone.c(149): Reading senone mixture weights: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/mixture_weights
INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(207): Not transposing mixture weights in memory
INFO: ms_senone.c(268): Read mixture weights for 6090 senones: 1 features x 32 codewords
INFO: ms_senone.c(320): Mapping senones to individual codebooks
INFO: ms_mgau.c(144): The value of topn: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 270112 * 32 bytes (8441 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.dic
INFO: dict.c(213): Dictionary size 266013, allocated 2217 KiB for strings, 4260 KiB for phones
INFO: dict.c(336): 266013 words read
INFO: dict.c(358): Reading filler dictionary: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/noisedict
INFO: dict.c(213): Dictionary size 266016, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 30^3 * 2 bytes (52 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21840 bytes (21 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21840 bytes (21 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 675 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 580308
INFO: ngram_search_fwdtree.c(333): Created 675 root, 580180 non-root channels, 75 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn_live.c(120): Update from < 40.00 3.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_live.c(138): Update to < 48.64 8.26 9.46 15.53 0.73 10.14 -16.38 -7.88 2.97 -27.40 19.44 -3.48 -3.37 >
INFO: ngram_search_fwdtree.c(1550): 2069 words recognized (8/fr)
INFO: ngram_search_fwdtree.c(1552): 660566 senones evaluated (2696/fr)
INFO: ngram_search_fwdtree.c(1556): 3277888 channels searched (13379/fr), 106535 1st, 54408 last
INFO: ngram_search_fwdtree.c(1559): 4296 words for which last channels evaluated (17/fr)
INFO: ngram_search_fwdtree.c(1561): 254434 candidate words for entering last phone (1038/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 2.07 CPU 0.844 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 2.07 wall 0.844 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 118 words
INFO: ngram_search_fwdflat.c(948): 1097 words recognized (4/fr)
INFO: ngram_search_fwdflat.c(950): 95729 senones evaluated (391/fr)
INFO: ngram_search_fwdflat.c(952): 72888 channels searched (297/fr)
INFO: ngram_search_fwdflat.c(954): 6586 words searched (26/fr)
INFO: ngram_search_fwdflat.c(957): 6611 word transitions (26/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.18 CPU 0.074 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.18 wall 0.074 xRT
INFO: ngram_search.c(1250): lattice start node <s>.0 end node </s>.240
INFO: ngram_search.c(1276): Eliminated 0 nodes before end node
INFO: ngram_search.c(1381): Lattice has 228 nodes, 280 links
INFO: ps_lattice.c(1374): Bestpath score: -8058
INFO: ps_lattice.c(1378): Normalizer P(O) = alpha(</s>:240:243) = -661594
INFO: ps_lattice.c(1435): Joint P(O,S) = -678760 P(S|O) = -17166
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.000 xRT
INFO: pocketsphinx.c(1170): que la voz del postulante en (-8196)
word start end pprob ascr lscr lback
INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
<s> 84 91 1.000 -306176 0 1
que 92 166 1.000 -1845248 -223 2
<sil> 167 169 0.866 -306176 -524288 2
la 170 180 0.965 -253952 -320 1
voz 181 201 0.968 -272384 -406 2
del 202 216 0.739 -475136 -185 3
postulante 217 279 1.000 -1587200 -168 3
en 280 323 1.000 -807936 -170 2
</s> 324 327 1.000 -807936 -189 2
INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
[('<s>', 1, 84, 91), ('que', 1, 92, 166), ('<sil>', -1443, 167, 169), ('la', -355, 170, 180), ('voz', -323, 181, 201), ('del', -3028, 202, 216), ('postulante', 0, 217, 279), ('en', -1, 280, 323), ('</s>', 0, 324, 327)]
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci yes yes
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no yes
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn live batch
-cmninit 40,3,-1 40,3,-1
-compallsen no no
-dict /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1e-30 1.000000e-30
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.lm.bin
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.300000e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 3.0 3.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: mdef.c(518): Reading model definition: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/mdef
INFO: bin_mdef.c(181): Allocating 79833 * 8 bytes (623 KiB) for CD tree
INFO: tmat.c(149): Reading HMM transition probability matrices: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: ptm_mgau.c(803): Number of codebooks exceeds 256: 6090
INFO: acmod.c(115): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: acmod.c(117): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/means
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/variances
INFO: ms_gauden.c(242): 6090 codebook, 1 feature, size:
INFO: ms_gauden.c(244): 32x39
INFO: ms_gauden.c(304): 79 variance values floored
INFO: ms_senone.c(149): Reading senone mixture weights: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/mixture_weights
INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(207): Not transposing mixture weights in memory
INFO: ms_senone.c(268): Read mixture weights for 6090 senones: 1 features x 32 codewords
INFO: ms_senone.c(320): Mapping senones to individual codebooks
INFO: ms_mgau.c(144): The value of topn: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 270112 * 32 bytes (8441 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.dic
INFO: dict.c(213): Dictionary size 266013, allocated 2217 KiB for strings, 4260 KiB for phones
INFO: dict.c(336): 266013 words read
INFO: dict.c(358): Reading filler dictionary: /home/amolina/repo/audiotranscriptor/modelSphinx/LKE_T29.cd_cont_6000/noisedict
INFO: dict.c(213): Dictionary size 266016, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 30^3 * 2 bytes (52 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21840 bytes (21 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21840 bytes (21 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 675 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 580308
INFO: ngram_search_fwdtree.c(333): Created 675 root, 580180 non-root channels, 75 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn_live.c(120): Update from < 40.00 3.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_live.c(138): Update to < 51.42 6.44 5.63 41.21 5.63 4.06 -30.89 -11.26 11.47 -38.96 22.43 -14.16 5.74 >
INFO: ngram_search_fwdtree.c(1550): 437 words recognized (6/fr)
INFO: ngram_search_fwdtree.c(1552): 136376 senones evaluated (1771/fr)
INFO: ngram_search_fwdtree.c(1556): 477102 channels searched (6196/fr), 27215 1st, 9054 last
INFO: ngram_search_fwdtree.c(1559): 902 words for which last channels evaluated (11/fr)
INFO: ngram_search_fwdtree.c(1561): 29178 candidate words for entering last phone (378/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.36 CPU 0.471 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.36 wall 0.471 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 13 words
INFO: ngram_search_fwdflat.c(948): 341 words recognized (4/fr)
INFO: ngram_search_fwdflat.c(950): 13931 senones evaluated (181/fr)
INFO: ngram_search_fwdflat.c(952): 9837 channels searched (127/fr)
INFO: ngram_search_fwdflat.c(954): 923 words searched (11/fr)
INFO: ngram_search_fwdflat.c(957): 395 word transitions (5/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.02 CPU 0.031 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.02 wall 0.031 xRT
INFO: ngram_search.c(1250): lattice start node <s>.0 end node </s>.73
INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
INFO: ngram_search.c(1381): Lattice has 48 nodes, 32 links
INFO: ps_lattice.c(1374): Bestpath score: -2486
INFO: ps_lattice.c(1378): Normalizer P(O) = alpha(</s>:73:75) = -148657
INFO: ps_lattice.c(1435): Joint P(O,S) = -155501 P(S|O) = -6844
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.000 xRT
INFO: pocketsphinx.c(1170): que (-2547)
word start end pprob ascr lscr lback
INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
<s> 7 14 1.000 -306176 0 1
que 15 79 1.000 -1390592 -223 2
</s> 80 82 1.000 -1390592 -345 3
----------------------------
| start | end | word |
----------------------------
INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
| 0.07s | 0.14s | <s> |
| 0.15s | 0.79s | que |
| 0.8s | 0.82s | </s> |
----------------------------
INFO: cmn_live.c(120): Update from < 51.42 6.44 5.63 41.21 5.63 4.06 -30.89 -11.26 11.47 -38.96 22.43 -14.16 5.74 >
INFO: cmn_live.c(138): Update to < 48.82 8.28 9.59 15.19 0.65 10.14 -16.30 -7.60 2.97 -27.46 19.32 -3.37 -3.41 >
INFO: ngram_search_fwdtree.c(1550): 4085 words recognized (25/fr)
INFO: ngram_search_fwdtree.c(1552): 465826 senones evaluated (2806/fr)
INFO: ngram_search_fwdtree.c(1556): 1701760 channels searched (10251/fr), 73994 1st, 78396 last
INFO: ngram_search_fwdtree.c(1559): 6445 words for which last channels evaluated (38/fr)
INFO: ngram_search_fwdtree.c(1561): 107024 candidate words for entering last phone (644/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 1.25 CPU 0.755 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 1.25 wall 0.755 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 266 words
INFO: ngram_search_fwdflat.c(948): 1400 words recognized (8/fr)
INFO: ngram_search_fwdflat.c(950): 115214 senones evaluated (694/fr)
INFO: ngram_search_fwdflat.c(952): 114370 channels searched (688/fr)
INFO: ngram_search_fwdflat.c(954): 11749 words searched (70/fr)
INFO: ngram_search_fwdflat.c(957): 13504 word transitions (81/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.23 CPU 0.138 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.23 wall 0.138 xRT
INFO: ngram_search.c(1250): lattice start node <s>.0 end node </s>.162
INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
INFO: ngram_search.c(1381): Lattice has 185 nodes, 243 links
INFO: ps_lattice.c(1374): Bestpath score: -7587
INFO: ps_lattice.c(1378): Normalizer P(O) = alpha(</s>:162:164) = -572256
INFO: ps_lattice.c(1435): Joint P(O,S) = -645037 P(S|O) = -72781
INFO: ngram_search.c(872): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(875): bestpath 0.00 wall 0.000 xRT
INFO: pocketsphinx.c(1170): la voz del postulante no (-7682)
word start end pprob ascr lscr lback
INFO: ngram_search.c(1027): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(1030): bestpath 0.00 wall 0.000 xRT
<s> 84 86 1.000 -133120 0 1
la 87 99 0.096 -362496 -257 2
voz 100 119 0.091 -572416 -404 3
del 120 132 0.015 -706560 -185 3
<sil> 133 137 0.781 -460800 -524288 3
postulante 138 199 1.000 -2325504 -871 1
no 200 245 0.501 -878592 -353 1
</s> 246 248 1.000 -878592 -192 2
INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 1.62 CPU 0.670 xRT
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 1.62 wall 0.670 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.25 CPU 0.105 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.25 wall 0.105 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.000 xRT
INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 2.07 CPU 0.847 xRT
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 2.07 wall 0.847 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.18 CPU 0.074 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.18 wall 0.074 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.000 xRT