fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

wav2vec2 inference simple process

Open elisonlau opened this issue 1 year ago • 4 comments

🚀 Feature Request

Provide a simple inference process/pipe for the wav2vec 2.0 model.

Motivation

Current inference script examples/speech_recognition/infer.py handles a lot of cases, resulting being extremely complex.

Pitch

@sooftware and I found issues #2651 has dealed with this request,but two years passed, that codes would happened a lot of errors, because of some library/independent has removed or changed. @sooftware or anyone else has updated that codes, or another simple solution to do infer

Alternatives

.

Thanks & Looking forward reply

elisonlau avatar Dec 26 '23 09:12 elisonlau

Check [link].

sooftware avatar Dec 26 '23 09:12 sooftware

Check [link].

@sooftware Thanks so much for your immediately reply. But further more whether you have the latest the recognize.py that fit for the latest fairseq library/code such as: the replacement of flashlight with wav2letter , "cannot import name 'base_architecture' from 'fairseq.models.wav2vec.wav2vec2_asr'"....

elisonlau avatar Dec 26 '23 13:12 elisonlau

Check [link].

@sooftware Thanks so much for your immediately reply. But further more whether you have the latest the recognize.py that fit for the latest fairseq library/code such as: the replacement of flashlight with wav2letter , "cannot import name 'base_architecture' from 'fairseq.models.wav2vec.wav2vec2_asr'"....

I dont want to use hf transformers, just want to use fairseq to infer

elisonlau avatar Dec 26 '23 14:12 elisonlau

@elisonlau please consider checking the official torchaudio backend for wav2vec2 based models, IIRC it well supports the ckpts from fairseq there as you can see in this tutorial: https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

I suspect its quote unlikely to see any substantial changes in the existing pipelines in this repo

uralik avatar Dec 26 '23 18:12 uralik

@elisonlau please consider checking the official torchaudio backend for wav2vec2 based models, IIRC it well supports the ckpts from fairseq there as you can see in this tutorial: https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

I suspect its quote unlikely to see any substantial changes in the existing pipelines in this repo

@uralik thanks for your reference, I have tried it but it didn't work. The error shows in below

 24 with torch.inference_mode():

---> 25 features, _ = model.extract_features(waveform) TypeError: Wav2VecCtc.forward() takes 1 positional argument but 2 were given

I an not sure whether that my checkpt file is fine-tune model or there is some distinguish between "TORCHAUDIO.PIPELINES" and original fariseq....

elisonlau avatar Dec 27 '23 03:12 elisonlau