wav2letter icon indicating copy to clipboard operation
wav2letter copied to clipboard

MLS Docker inference examples

Open loretoparisi opened this issue 4 years ago • 8 comments

Question

To provide examples of inference with MLS pretrained tokens&lexicon and acoustic and language models showing

  • Which docker image shall be used
  • How to pass the models and lexicon to the run command
  • how to run in both CPU and GPU

Additional Context

Currently a example command to run a wav2letter inference with the latest docker image is the following

sudo docker run --rm -v ~:/root/host/ -it --ipc=host --name w2l -a stdin -a stdout -a stderr wav2letter/wav2letter:inference-latest sh -c "cat /root/host/audio/LibriSpeech/dev-clean/777/126732/777-126732-0070.flac.wav | /root/wav2letter/build/inference/inference/examples/simple_streaming_asr_example --input_files_base_path /root/host/model"

I have recently built a simpler docker image to run wav2vec inference here It would be cool to have a simple pipeline for MLS/wav2letter as well!

loretoparisi avatar Jan 27 '21 07:01 loretoparisi

cc @vineelpratap @xuqiantong

tlikhomanenko avatar Jan 27 '21 19:01 tlikhomanenko

Hi, To run inference please follow the commands here -https://github.com/facebookresearch/wav2letter/tree/master/recipes/mls#decoding using the latest docker from flashlight repo. We don't provide pre trained models only for offline ASR and not for streaming ASR (https://github.com/facebookresearch/wav2letter/issues/920) and hence simple_streaming_asr_example cannot be used..

vineelpratap avatar Jan 27 '21 19:01 vineelpratap

@vineelpratap thanks! So as I enter the docker container then I run commands for decoding, but I see two different syntax here. For beam search, we have;

/flashlight/build/bin/asr/fl_asr_decode --flagsfile=decode/[lang].cfg

While for viterbi decoding we have

fl_asr_test --am=[...]/am.bin --lexicon=[...]/train_lexicon.txt --datadir=[...] --test=test.lst --tokens=[...]/tokens.txt --emission_dir='' --nouselexicon --show

Why?

loretoparisi avatar Jan 27 '21 19:01 loretoparisi

That's true.

fl_asr_test is for viterbi decoding while fl_asr_decode is for beam search decoding with a Language Model. If you just care about getting the best WER, please use the latter.

vineelpratap avatar Jan 27 '21 19:01 vineelpratap

Hello,

I don't know if my following question is the kind of question that is proper to ask in Github. Still, since I have been fighting the last days with this I decided myself to ask. I am trying to learn how to train a speech recognition system in spanish using Python and I found about wav2letter in the following link https://ai.facebook.com/blog/a-new-open-data-set-for-multilingual-speech-research/, whiche led me here https://github.com/facebookresearch/wav2letter/tree/master/recipes/mls . I downloaded the proper files and I tried to follow the USAGE STEPS in wav2letter/recipes/mls/README.md

  • I had no problem doing the preparation of dataset.
  • I could not complete the training or decoding steps, even looking at the flashlight project. When in README.md you say that there are dependencies with flashlight, I don't know what does that mean. I tried to download the flashlight project, next to the wav2letter project but I think that is not the solution. I noticed that the flashlight project is written in c++. Does that mean that I need c++ to run this MLS part of the wav2project locally?

schipoco avatar Feb 02 '21 17:02 schipoco

Hi, yes, once you build flashlight (https://github.com/facebookresearch/flashlight#building-and-installing), it'll build the binaries for decoding. You can then use the commands mentioned in the MLS recipe to run decoding...

vineelpratap avatar Feb 09 '21 05:02 vineelpratap

@vineelpratap is it possible to build using the provided Dockerfile here and then using the MLS recipe to run the decoder in the same way?

loretoparisi avatar Feb 09 '21 09:02 loretoparisi

Yes, that is also possible~

vineelpratap avatar Feb 09 '21 10:02 vineelpratap