julius
julius copied to clipboard
wchmm_add_word: CDSET phoneme exist in monophone?
Running own acoustic model + dictionary with given ENVR-v5.3 language models. Everything were fine before building HMM lexicon tree. Got cmd line error message: "wchmm_add_word: CDSET phoneme exist in monophone?"
Full output:
STAT: include config: xxx.jconf
Stat: para: parsing HTK Config file: xxx_config
Warning: para: "SOURCEFORMAT" ignored (not supported, or irrelevant)
Warning: para: TARGETKIND skipped (will be determined by AM header)
Stat: para: SOURCERATE=1250
Stat: para: TARGETRATE=200000.0
Warning: para: "SAVECOMPRESSED" ignored (not supported, or irrelevant)
Warning: para: "SAVEWITHCRC" ignored (not supported, or irrelevant)
Stat: para: WINDOWSIZE=250000.0
Stat: para: USEHAMMING=T
Stat: para: PREEMCOEF=0.97
Stat: para: NUMCHANS=9
Stat: para: CEPLIFTER=22
Warning: para: NUMCEPS skipped (will be determined by AM header)
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: rdhmmdef: ascii format HMM definition
Stat: rdhmmdef: limit check passed
Stat: check_hmm_restriction: an HMM with several arcs from initial state found: "sp"
Stat: rdhmmdef: this HMM requires multipath handling at decoding
Stat: rdhmmdef: no <SID> embedded
Stat: rdhmmdef: assign SID by the order of appearance
Stat: init_phmm: defined HMMs: 39425
Stat: init_phmm: loading ascii hmmlist
Stat: init_phmm: logical names: 95243 in HMMList
Stat: init_phmm: base phones: 47 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: m_fusion: force multipath HMM handling by user request
STAT: making pseudo bi/mono-phone for IW-triphone
Stat: hmm_lookup: 23 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
Stat: init_voca: read 76549 words
Stat: init_ngram: reading in ARPA forward n-gram from ENVR-v5.3.bg
Stat: ngram_read_arpa: this is 2-gram file
Stat: ngram_read_arpa: reading 1-gram part...
Stat: ngram_read_arpa: read 262145 1-gram entries
Stat: ngram_read_arpa: reading 2-gram part...
Stat: ngram_read_arpa: 2-gram read 0 (0%)
Stat: ngram_read_arpa: 2-gram read 100000 (0%)
Stat: ngram_read_arpa: 2-gram read 200000 (1%)
Stat: ngram_read_arpa: 2-gram read 300000 (1%)
Stat: ngram_read_arpa: 2-gram read 400000 (2%)
Stat: ngram_read_arpa: 2-gram read 500000 (3%)
Stat: ngram_read_arpa: 2-gram read 600000 (3%)
...
...
Stat: ngram_read_arpa: 2-gram read 16200000 (98%)
Stat: ngram_read_arpa: 2-gram read 16300000 (99%)
Stat: ngram_read_arpa: 2-gram read 16380163 end
Stat: init_ngram: found unknown word entry "
Any suggestions? Many thanks!
00readme-DNN.txt: "To prepare a model for DNN-HMM, note that the orders are important. The order of the output nodes in the DNN should be the order of HMM state definition id. If not, Julius won't work properly."
I think this is your problem.
@Estalhun Thank you so much! After hours of debugging, I have the same suspicion now. Although I'm not using DNN-HMM, but GMM instead, I think the cause is the same. The custom acoustic model that I'm using doesn't have <SID> included. Sad.