belebele
belebele copied to clipboard
XLM-R training configuration
Hi, I was trying to train the using xlm-r base on the assembled training data but it doesn't converge and giving random output (24% accuracy on eng_Latn) while I gets around 53% accuracy using mbert.
I am using huggingface's multiple choice training implementation (https://github.com/huggingface/transformers/blob/main/examples/pytorch/multiple-choice/run_swag.py) and tried learning rates (1e-5, 2e-5, 5e-5).
Weirdly, if I use just mayb 1500 examples, I get better output with 400 steps of training.
Would you mind share the training configuration using xlmr? or let me know if you have any idea what I am missing here.
python run_swag.py \
--model_name_or_path ${MODEL_PATH}\
--do_train \
--do_eval \
--train_file ${train_file} \
--prefix "train_combined" \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--per_device_eval_batch_size=8 \
--per_device_train_batch_size=8 \
--overwrite_output \
--output_dir ${output_dir} \
--max_seq_length 512 \
--cache_dir ${CACHE_DIR} \
--overwrite_cache \
--save_total_limit 5 \
--save_steps 500 \
--eval_steps 500 \
--save_strategy="steps" \
--evaluation_strategy="steps" \
--load_best_model_at_end True
Thanks
Hi, I want to know have you solved this training problem? If so, what is the training configuration you have? Thanks!
+1