espnet icon indicating copy to clipboard operation
espnet copied to clipboard

ASR decoding output text

Open PO-CHEN-LAI opened this issue 3 years ago • 5 comments

There are my decode coef below

batchsize: 1 beam-size: 20 penalty: 0.0 maxlenratio: 0.5 minlenratio: 0.1 ctc-weight: 0.3 lm-weight: 0.5 ngram-weight: 0.0 nj= 2 ngpu=0


decode log

asr_recog.py --config conf/decode.yaml --ngpu 0 --backend pytorch --batchsize 1 --recog-json dump/test/deltafalse/split2utt/data.1.json --result-label exp/train_sp_pytorch_train/decode_test_decode_lm_4/data.1.json --model exp/train_sp_pytorch_train/results/model.last10.avg.best --rnnlm exp/train_rnnlm_pytorch_lm/rnnlm.model.best --ngram-model exp/train_ngram/4gram.bin --api v2 Started at Thu Sep 8 09:45:47 CST 2022

2022-09-08 09:45:48,674 (asr_recog:353) INFO: python path = /home/barry/espnet/tools/s3prl: 2022-09-08 09:45:48,674 (asr_recog:358) INFO: set random seed = 1 2022-09-08 09:45:48,674 (asr_recog:369) INFO: backend = pytorch 2022-09-08 09:46:31,019 (recog:33) WARNING: experimental API for custom LMs is selected by --api v2 2022-09-08 09:46:31,106 (deterministic_utils:26) INFO: torch type check is disabled 2022-09-08 09:46:31,106 (asr_utils:693) INFO: reading a config file from exp/train_sp_pytorch_train/results/model.json 2022-09-08 09:46:31,126 (asr_init:215) INFO: Reading model parameters from exp/train_sp_pytorch_train/results/model.last10.avg.best 2022-09-08 09:46:32,005 (encoder:177) INFO: encoder self-attention layer type = self-attention 2022-09-08 09:46:32,193 (decoder:124) INFO: decoder self-attention layer type = self-attention 2022-09-08 09:46:32,425 (encoder:174) INFO: encoder self-attention layer type = relative self-attention 2022-09-08 09:46:43,728 (asr_utils:693) INFO: reading a config file from exp/train_rnnlm_pytorch_lm/model.json 2022-09-08 09:46:43,898 (default:377) INFO: Tie weights set to False 2022-09-08 09:46:43,899 (default:378) INFO: Dropout set to 0.5 2022-09-08 09:46:43,899 (default:379) INFO: Emb Dropout set to 0.0 2022-09-08 09:47:03,987 (recog:139) INFO: BatchBeamSearch implementation is selected. 2022-09-08 09:47:04,094 (recog:153) INFO: Decoding device=cpu, dtype=torch.float32 2022-09-08 09:47:04,607 (recog:163) INFO: (1/3588) decoding BAC009S0764W0121 2022-09-08 09:47:23,615 (beam_search:356) INFO: decoder input length: 103 2022-09-08 09:47:23,991 (beam_search:357) INFO: max output length: 51 2022-09-08 09:47:23,991 (beam_search:358) INFO: min output length: 10 2022-09-08 09:47:44,332 (batch_beam_search:317) INFO: adding in the last position in the loop 2022-09-08 09:47:45,756 (beam_search:373) INFO: no hypothesis. Finish decoding. 2022-09-08 09:47:46,115 (beam_search:395) INFO: -336.05 * 0.7 = -235.23 for decoder 2022-09-08 09:47:46,115 (beam_search:395) INFO: -375.55 * 0.3 = -112.67 for ctc 2022-09-08 09:47:46,115 (beam_search:395) INFO: -78.21 * 0.5 = -39.10 for lm 2022-09-08 09:47:46,116 (beam_search:397) INFO: total log probability: -387.00 2022-09-08 09:47:46,147 (beam_search:398) INFO: normalized log probability: -7.30 2022-09-08 09:47:46,147 (beam_search:399) INFO: total number of ended hypotheses: 20 2022-09-08 09:47:46,147 (beam_search:404) INFO: best hypo: 一线城市的房地产市场仍然有一定的影响力的一线城市的一线城市土地市场成交量的一线城市的一线城市的一线城市

2022-09-08 09:47:46,147 (asr_utils:889) INFO: groundtruth: 甚至出现交易几乎停滞的情况 2022-09-08 09:47:46,147 (asr_utils:890) INFO: prediction : 一线城市的房地产市场仍然有一定的影响力的一线城市的一线城市土地市场成交量的一线城市的一线城市的一线城市 2022-09-08 09:47:46,148 (recog:163) INFO: (2/3588) decoding BAC009S0764W0122 2022-09-08 09:47:48,948 (beam_search:356) INFO: decoder input length: 101 2022-09-08 09:47:48,948 (beam_search:357) INFO: max output length: 50 2022-09-08 09:47:48,948 (beam_search:358) INFO: min output length: 10 2022-09-08 09:47:58,292 (batch_beam_search:317) INFO: adding in the last position in the loop 2022-09-08 09:47:58,293 (beam_search:373) INFO: no hypothesis. Finish decoding. 2022-09-08 09:47:58,293 (beam_search:395) INFO: -329.73 * 0.7 = -230.81 for decoder 2022-09-08 09:47:58,293 (beam_search:395) INFO: -369.17 * 0.3 = -110.75 for ctc 2022-09-08 09:47:58,293 (beam_search:395) INFO: -75.84 * 0.5 = -37.92 for lm 2022-09-08 09:47:58,293 (beam_search:397) INFO: total log probability: -379.48 2022-09-08 09:47:58,293 (beam_search:398) INFO: normalized log probability: -7.30 2022-09-08 09:47:58,293 (beam_search:399) INFO: total number of ended hypotheses: 20 2022-09-08 09:47:58,293 (beam_search:404) INFO: best hypo: 一线城市的房地产市场仍然有一定的影响力的一线城市的一线城市土地市场成交量的一线城市一线城市的一线城市

2022-09-08 09:47:58,294 (asr_utils:889) INFO: groundtruth: 一二线城市虽然也处于调整中 2022-09-08 09:47:58,294 (asr_utils:890) INFO: prediction : 一线城市的房地产市场仍然有一定的影响力的一线城市的一线城市土地市场成交量的一线城市一线城市的一线城市


All of prediction text are same. Like prediction :一线城市的房地产市场仍然有一定的影响力的一线城市的一线城市土地市场成交量的一线城市一线城市的一线城市 I have checked that ctc is work. And I have tried to adjust decode coef ,but it didn't work. I am trying restart the program from stage 0 now. Does anyone know how to fix it ?

PO-CHEN-LAI avatar Sep 08 '22 15:09 PO-CHEN-LAI

Will you please help me to understand about asr decoding?

I means, How different is the encoder+ctc(argmax) and asr decoding output?

neso613 avatar Sep 15 '22 08:09 neso613

All of prediction text are same. Like prediction :一线城市的房地产市场仍然有一定的影响力的一线城市的一线城市土地市场成交量的一线城市一线城市的一线城市 I have checked that ctc is work. And I have tried to adjust decode coef ,but it didn't work. I am trying restart the program from stage 0 now. Does anyone know how to fix it ?

Can you paste the learning curve and inference config?

sw005320 avatar Sep 15 '22 11:09 sw005320

network architecture

encoder related

elayers: 12 eunits: 2048

decoder related

dlayers: 6 dunits: 2048

attention related

adim: 256 aheads: 4

hybrid CTC/attention

mtlalpha: 0.3

label smoothing

lsm-weight: 0.1

minibatch related

batch-size: 24 maxlen-in: 512 # if input length > maxlen-in, batchsize is automatically reduced maxlen-out: 150 # if output length > maxlen-out, batchsize is automatically reduced

optimization related

sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs opt: noam accum-grad: 2 grad-clip: 5 patience: 0 epochs: 6 dropout-rate: 0.1

transformer specific setting

backend: pytorch model-module: "espnet.nets.pytorch_backend.e2e_asr_conformer:E2E" transformer-input-layer: conv2d # encoder architecture type transformer-lr: 1.0 transformer-warmup-steps: 25000 transformer-attn-dropout-rate: 0.0 transformer-length-normalized-loss: false transformer-init: pytorch

conformer specific setting

transformer-encoder-pos-enc-layer-type: rel_pos transformer-encoder-selfattn-layer-type: rel_selfattn transformer-encoder-activation-type: swish rel-pos-type: latest macaron-style: true use-cnn-module: true cnn-module-kernel: 15 acc

loss

PO-CHEN-LAI avatar Sep 16 '22 00:09 PO-CHEN-LAI

Did you use your own data? I guess your training/test split may have some issues.

sw005320 avatar Sep 16 '22 00:09 sw005320

i didn't use my own data

PO-CHEN-LAI avatar Sep 16 '22 02:09 PO-CHEN-LAI