jiant Unable to reproduce XTREME numbers

Unable to Reproduce the XTREME numbers for xlm-roberta-large

We are unable to reproduce the xtreme benchmark numbers as reported in the original paper. I provide an example of PAWSX and XNLI here.

To Reproduce

Branch: mainline
Environment: 1 p4.8xlarge
Hyperparams for

XNLI

--model_type $MODEL_TYPE \
--model_name_or_path $MODEL \
--train_language en \
--task_name xnli \
--do_train \
--do_eval \
--do_predict \
--gradient_accumulation_steps 4 \
--per_gpu_train_batch_size 64 \
--learning_rate 2e-5 \
--num_train_epochs 2 \
--max_seq_length 128 \
--output_dir $SAVE_DIR/ \
--save_steps 500 \
--logging_steps 500 \
--eval_all_checkpoints \
--log_file 'train' \
--predict_languages "ar,bg,de,el,en,es,fr,hi,ru,sw,th,tr,ur,vi,zh" \
--save_only_best_checkpoint \
--overwrite_output_dir

PAWSX

{
  "jiant_task_container_config_path": "/home/ec2-user/jiant/xtreme-exp/runconfigs/pawsx.json",
  "output_dir": "/home/ec2-user/jiant/xtreme-exp/runs/pawsx",
  "hf_pretrained_model_name_or_path": "xlm-roberta-large",
  "model_path": "/home/ec2-user/jiant/xtreme-exp/models/xlm-roberta-large/model/model.p",
  "model_config_path": "/home/ec2-user/jiant/xtreme-exp/models/xlm-roberta-large/model/config.json",
  "model_load_mode": "from_transformers",
  "do_train": true,
  "do_val": true,
  "do_save": true,
  "do_save_last": false,
  "do_save_best": false,
  "write_val_preds": false,
  "write_test_preds": true,
  "eval_every_steps": 1000,
  "save_every_steps": 0,
  "save_checkpoint_every_steps": 0,
  "no_improvements_for_n_evals": 5,
  "keep_checkpoint_when_done": false,
  "force_overwrite": true,
  "seed": 1146493838,
  "learning_rate": 3e-05,
  "adam_epsilon": 1e-08,
  "max_grad_norm": 1.0,
  "optimizer_type": "adam",
  "no_cuda": false,
  "fp16": false,
  "fp16_opt_level": "O1",
  "local_rank": -1,
  "server_ip": "",
  "server_port": ""
}

Results

"pawsx": {
"accuracy": {"de": 55.25, 
             "en": 54.65, 
             "es": 54.65, 
             "fr": 54.85, 
             "ja": 55.85, 
             "ko": 55.15,
             "zh": 55.300000000000004}, 
"avg_accuracy": 55.1, 
"avg_metric": 55.1},

This number is too low. We were expecting this number to be around ~80%.

Similarly, for XNLI the numbers we are getting are far lesser than those reported on the paper.

Is there something we are missing?

Jul 12 '21 06:07 dapurv5

@zphang, mind taking a look?

Jul 19 '21 14:07 sleepinyourhat

Hi,

I believe the issue may have been that the XLM-R weights not being correctly loaded because of a recent update. I've made a PR that should address the issue (https://github.com/nyu-mll/jiant/pull/1329). Could you retry and let me know if it works?

Jul 26 '21 19:07 zphang

jiant jiant copied to clipboard

Unable to reproduce XTREME numbers

jiant
jiant copied to clipboard