Tatiana Likhomanenko
Tatiana Likhomanenko
During training Viterbi WER (greedy path) is reported in the logs. Decoding with lm we do separately. The practice is to pick the best Viterbi WER snapshot and then decode...
could you first try to run with config without empty lines and attach the full log you see on the screen?
@AlexandderGorodetski We didn't try to run training for 1k hours on 1 GPU. For 32 GPUs training we have fully trained transformer model in 3 days on 1k hours and...
cc @vineelpratap could you navigate here with latest wav2letter/flashlight? Otherwise @xiaosdawn you needd to use docker built at time of branch v0.2 before we did refactor of the whole code...
Inside Dockerfile it is used latest master of flashlight. You need to fix Dockerfile to use flashlight with v0.2 branch too.
Maybe I misunderstood but what I see now: 1) you are not using released model, you retrained the model 2) you retrained model with the version of flashlight where w2l...
@samin9796 Can you try to run at first without specaug and with `--lrcosine=false` and train with constant lr. Please add also log of training where params are printed to check...
Are you training letter-based acoustic model? Could you show me head of your tokens set and lexicon file?
Could you run with `--warmup=1 --reporiters=1 --surround=|`?
Possibly you have problems with data itself. You could try to filter them with `minisz`, `maxisz`, `mintsz`, `maxtsz`. Duration 10-15s should be fine as well as 2-5 sec. We trained...