Vineel Pratap comments

Results 41 comments of


                                            Vineel Pratap

unstable training

Hi, We have not seen this issue lately. For a sanity check, could you run an experiment where you filter audio samples < 1sec and target length < 5.

unstable training

Also, could you let us know the input and target size distribution - min, max, avg, stddev

Forward long audio

Hi, Can you also post your network architecture here so that we can verify padding.

Inference failed with long audio

Hi, I think the problem could be that for very very long audios, we need to re-normalize the computed `alphas` (forward probabilities) . I'm looking into the best way to...

Inference failed with long audio

I cannot reproduce the issue. I took a librispeech audio and replicated it 100 times to create a ~30 minute audio and used `simple_streaming_asr_example` and it transcribed everything correctly... This...

Inference results are different on different computers

Can you check if this PR is included in your commit https://github.com/fairinternal/wav2letter/commit/e4f6d1d236e653257c0377794f251c6810b4b2e6

Decoding now yielding duplicate words

As a temporary fix, you can revert these changes manually https://github.com/facebookresearch/flashlight/commit/ce02babd2f413643bb4ba7064827f4404ed2758e and build. It should solve the issue.

which wer cat you get when train streaming_convnets using librispeech only 1k hours data?

Hi, the train-TER seems a bit high from the post. You might want to try fine-tuning `--momentum`, `--dropout` and also half the learning rate after every `n` (say 100) epochs...

MLS Docker inference examples

Hi, To run inference please follow the commands here -https://github.com/facebookresearch/wav2letter/tree/master/recipes/mls#decoding using the latest docker from flashlight repo. We don't provide pre trained models only for offline ASR and not for...

MLS Docker inference examples

That's true. `fl_asr_test` is for viterbi decoding while `fl_asr_decode ` is for beam search decoding with a Language Model. If you just care about getting the best WER, please use...