fairseq
fairseq copied to clipboard
wav2vec2 inference simple process
🚀 Feature Request
Provide a simple inference process/pipe for the wav2vec 2.0 model.
Motivation
Current inference script examples/speech_recognition/infer.py handles a lot of cases, resulting being extremely complex.
Pitch
@sooftware and I found issues #2651 has dealed with this request,but two years passed, that codes would happened a lot of errors, because of some library/independent has removed or changed. @sooftware or anyone else has updated that codes, or another simple solution to do infer
Alternatives
.
Thanks & Looking forward reply
Check [link].
Check [link].
@sooftware Thanks so much for your immediately reply. But further more whether you have the latest the recognize.py that fit for the latest fairseq library/code such as: the replacement of flashlight with wav2letter , "cannot import name 'base_architecture' from 'fairseq.models.wav2vec.wav2vec2_asr'"....
Check [link].
@sooftware Thanks so much for your immediately reply. But further more whether you have the latest the recognize.py that fit for the latest fairseq library/code such as: the replacement of flashlight with wav2letter , "cannot import name 'base_architecture' from 'fairseq.models.wav2vec.wav2vec2_asr'"....
I dont want to use hf transformers, just want to use fairseq to infer
@elisonlau please consider checking the official torchaudio backend for wav2vec2 based models, IIRC it well supports the ckpts from fairseq there as you can see in this tutorial: https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html
I suspect its quote unlikely to see any substantial changes in the existing pipelines in this repo
@elisonlau please consider checking the official torchaudio backend for wav2vec2 based models, IIRC it well supports the ckpts from fairseq there as you can see in this tutorial: https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html
I suspect its quote unlikely to see any substantial changes in the existing pipelines in this repo
@uralik thanks for your reference, I have tried it but it didn't work. The error shows in below
24 with torch.inference_mode():
---> 25 features, _ = model.extract_features(waveform) TypeError: Wav2VecCtc.forward() takes 1 positional argument but 2 were given
I an not sure whether that my checkpt file is fine-tune model or there is some distinguish between "TORCHAUDIO.PIPELINES" and original fariseq....