sravyapopuri388 comments

Results 8 comments of


                                            sravyapopuri388

Loading Wav2Vec2-2-mBart Checkpoints for S2UT

Hi, thanks for reaching out. Seems like the issue is with missing pretrained models. A quick fix is to update the paths in the checkpoint. You can update the w2v_path...

Loading Wav2Vec2-2-mBart Checkpoints for S2UT

Hi Sanchit, thanks for your patience. Could you try updating the w2v_path in w2v2_mbart_LND_w_ASR.pt state with the original W2v model args instead. More like ``` import torch from omegaconf import...

Loading Wav2Vec2-2-mBart Checkpoints for S2UT

Hi @sanchit-gandhi yes the output feature dimension size looks correct to me. Just overriding the w2v_args argument with the args from original w2v2 model and setting the load_pretrained_decoder_from path to...

[Wav2Vec2] Wav2Vec2Conformer Fine-Tuned seems to give Gibberish on Librispeech example

Thanks for the ping @patrickvonplaten. I will look into this and get back to you.

[Wav2Vec2] Wav2Vec2Conformer Fine-Tuned seems to give Gibberish on Librispeech example

Hi, I tried decoding the [model](https://dl.fbaipublicfiles.com/fairseq/conformer/wav2vec2/librilight/LL_relpos_PT_960h_FT.pt) using the following command from the wiki and the results are good. Could you please recheck your setup. Thanks! ``` $subset=dev_other python3 examples/speech_recognition/infer.py $DATA_DIR...

[Wav2Vec2] Wav2Vec2Conformer Fine-Tuned seems to give Gibberish on Librispeech example

Hi [@patrickvonplaten](https://github.com/patrickvonplaten), updated the above command to not use language model and still works correctly. I used the dictionary open sourced in the wav2vec README [here](https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt) To run with a...

Can not make asr

Could you share some pointers to commands you are running and input you are using so I can reproduce the error on my end. Thanks!

S2TT wrong reults after SPEECH_TO_TEXT finetuning

@woqiang0515 Is it intentional to finetune the model on English ASR dataset and evaluate it for English to Arabic translation? Would suggest finetuning on English to Arabic S2T dataset if...