fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Generating with MBART50 not working

Open MathieuGrosso opened this issue 1 year ago • 0 comments

🐛 Bug

Hello i have downloaded the many to many mbart50 and i want to test it in en-fr with data from wmt. It did not work and I keep having the same word generated instead of a good translation. Do you know why? Are the model not pretrained ? Maybe i have not understand it right.

Here is a file showing what I get:

image001

To Reproduce

What i did:

  1. First downloaded with sacrebleu dataset wmt en-fr

  2. Python /path/to/fairseq/examples/multilingual/data_scripts/binarize.py

Export path_2_data=$work_dir/databin Export model=$work_dir/model.pt Export langs="ar_AR,....,sl_SI" Export source_lang="en_XX" Export target_lang="fr_XX"

Fairseq-generate $path_2_data
--path $model
--task translation_from_pretrained_bart
--gen-subset test \ -s en_XX -t fr_XX \ --sacrebleu --remove-bpe 'sentencepiece' \ --batch-size 32 \ --encoder-langtok "src" \ --decoder-langtok \ --langs $langs

Environment

  • fairseq Version (1.0.0 ):
  • PyTorch Version (1.11)
  • How you installed fairseq (pip, source): from zip + pip install --editable .
  • Python version: 3.8
  • CUDA/cuDNN version: 11.X
  • GPU models and configuration: tesla v100

MathieuGrosso avatar Jul 05 '22 15:07 MathieuGrosso