Vineel Pratap
Vineel Pratap
Hi, this is not a bug. You can pass multiple audio files in the command.
Hi, can you share the entire log? I just tested the code again and it works fine from my end.
@audiolion We expect a 3-digit language code. See 'Supported languages' section in README file for each model. For example - use 'eng' for English.
@shsagnik `No module named 'editdistance'` - You should install the missing module.
Hi, can you change `torch.cat(emissions_arr, dim=1)` --> `torch.cat(emissions_arr, dim=-1)` in `align_and_segment.py` file. I'll send a PR to fix the code soon.
Hi, I just landed the fix in https://github.com/facebookresearch/fairseq/pull/5133. Please use the updated code.
Hi, MMS uses Transformer layer with an additional adapter module which is not used on original wav2vec2.0. See - https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/wav2vec/wav2vec2.py#L978, https://github.com/facebookresearch/fairseq/blob/main/examples/speech_recognition/new/infer.py#L108 You would have to make appropriate changes in your...
Hi, I would recommend to run the decoding with Language Model to get better accuracy.
Looks like HuggingFace has a 50GB limit on the models. I'll upload the model on S3 and share the link here soon.
Please see the instructions here on how to download the model and run them - https://huggingface.co/facebook/mms-cclms