wav2letter icon indicating copy to clipboard operation
wav2letter copied to clipboard

Decoding during the training of AM.

Open gopesh97 opened this issue 5 years ago • 1 comments
trafficstars

I am training my acoustic model. Here is my configuration file.

--datadir=/home/english_data/ --runname=english_train --rundir=/home/training/ --tokensdir=/home/am/ --listdata=true --train=lists/train.lst --valid=lists/dev.lst --input=wav --datadir=/home/english_data/ --runname=english_train --rundir=/home/training/ --tokensdir=/home/am/ --listdata=true --train=lists/train.lst --valid=lists/dev.lst --input=wav --arch=network.arch --archdir=/home/ --lexicon=/home/am/librispeech-train+dev-unigram-10000-nbest10.lexicon --tokens=librispeech-train-all-unigram-10000.tokens --criterion=seq2seq --lr=0.05 --lrcrit=0.05 --momentum=0.0 --stepsize=40 --gamma=0.5 --maxgradnorm=15 --mfsc=true --use_saug=true --dataorder=output_spiral --inputbinsize=25 --filterbanks=80 --attention=keyvalue --encoderdim=512 --attnWindow=softPretrain --softwstd=4 --trainWithWindow=true --pretrainWindow=3 --maxdecoderoutputlen=120 --usewordpiece=true --wordseparator=_ --sampletarget=0.01 --target=ltr --batchsize=4 --labelsmooth=0.05 --nthread=4 --memstepsize=4194304 --eostoken=true --pcttraineval=1 --pctteacherforcing=99 --iter=200 --enable_distributed=true

Currently I am getting this result,

epoch: 58 | lr: 0.025000 | lrcriterion: 0.025000 | runtime: 06:44:50 | bch(ms): 237.77 | smp(ms): 1.02 | fwd(ms): 14.86 | crit-fwd(ms): 1.07 | bwd(ms): 213.77 | optim(ms): 7.74 | loss: 32.41068 | train-LER: 31.00 | train-WER: 47.11 | lists/dev.lst-loss: 16.45153 | lists/dev.lst-LER: 22.34 | lists/dev.lst-WER: 35.14 | avg-isz: 1003 | avg-tsz: 018 | max-tsz: 130 | hrs: 4556.04 | thrpt(sec/sec): 675.23

I wanted to know how are you internally decoding the dev.lst during this training. I mean, are you using the greedy path or the beam search decoder. Also, what parameters are you using for the same, and among those parameters, which ones are being randomly chosen?

gopesh97 avatar Jun 18 '20 07:06 gopesh97

During training Viterbi WER (greedy path) is reported in the logs. Decoding with lm we do separately. The practice is to pick the best Viterbi WER snapshot and then decode it with some LM. In case of decoding we just randomly sample hyper-parameters (like lm weight and word score, for example) and choose those parameters values which give the best dev set WER.

tlikhomanenko avatar Jun 18 '20 17:06 tlikhomanenko