fairseq
fairseq copied to clipboard
Generating with MBART50 not working
🐛 Bug
Hello i have downloaded the many to many mbart50 and i want to test it in en-fr with data from wmt. It did not work and I keep having the same word generated instead of a good translation. Do you know why? Are the model not pretrained ? Maybe i have not understand it right.
Here is a file showing what I get:
To Reproduce
What i did:
-
First downloaded with sacrebleu dataset wmt en-fr
-
Python /path/to/fairseq/examples/multilingual/data_scripts/binarize.py
Export path_2_data=$work_dir/databin Export model=$work_dir/model.pt Export langs="ar_AR,....,sl_SI" Export source_lang="en_XX" Export target_lang="fr_XX"
Fairseq-generate $path_2_data
--path $model
--task translation_from_pretrained_bart
--gen-subset test \
-s en_XX -t fr_XX \
--sacrebleu --remove-bpe 'sentencepiece' \
--batch-size 32 \
--encoder-langtok "src" \
--decoder-langtok \
--langs $langs
Environment
- fairseq Version (1.0.0 ):
- PyTorch Version (1.11)
- How you installed fairseq (
pip
, source): from zip + pip install --editable . - Python version: 3.8
- CUDA/cuDNN version: 11.X
- GPU models and configuration: tesla v100