xlnet
xlnet copied to clipboard
Reproduction results for SQuAD 2.0
Hi, thanks for your contribution! I am reproducing the results for SQuAD 2.0 using 8 V100 GPUs (32G).
- I followed the provided hyper-paramters, but the result is lower than the reported one by about 0.5%. Do I miss anything? I also tried to set train_steps to 12000 and lr to 2e-5 but it showed little gain.
python run_squad.py \ --use_tpu=False \ --num_hosts=1 \ --num_core_per_host=8 \ --model_config_path=${INIT_CKPT_DIR}/xlnet_config.json \ --spiece_model_file=${INIT_CKPT_DIR}/spiece.model \ --output_dir=${PROC_DATA_DIR} \ --init_checkpoint=${INIT_CKPT_DIR}/xlnet_model.ckpt \ --model_dir=${MODEL_DIR} \ --train_file=${SQUAD_DIR}/train-v2.0.json \ --predict_file=${SQUAD_DIR}/dev-v2.0.json \ --uncased=False \ --max_seq_length=512 \ --do_train=True \ --train_batch_size=6 \ --do_predict=True \ --predict_batch_size=32 \ --learning_rate=3e-5 \ --adam_epsilon=1e-6 \ --iterations=1000 \ --save_steps=1000 \ --train_steps=8000 \ --warmup_steps=1000 \ $@
HasAns_exact = 83.6707152497 HasAns_f1 = 89.5736849476 HasAns_total = 5928 NoAns_exact = 85.7695542473 NoAns_f1 = 85.7695542473 NoAns_total = 5945 best_exact = 85.6396866841 best_exact_thresh = -4.23123931885 best_f1 = 88.3920251054 best_f1_thresh = -3.96573591232 exact = 84.7216373284 f1 = 87.6688961821 total = 11873
- The paper indicated that joint training with NewsQA can push the EM from 86.12% to 86.35%. But I tried adding the NewsQA to SQuAD training set, the result droped about 0.5%. Could you give more details of the data augmentation? And any change of hyper-parameter setting?
Thanks!
@cooelf did you get any updates on the newsQA part?
@rakshanda22 Unfortunately not yet :(
Excuse me, I have successfully ran sudo./prepro_squad.sh, and I have moved on to the next step in the SQuAD2.0 training/testing. But I've got an error. Here is what the output is: ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: 'Tensor("arg0:0", shape=(), dtype=float32, device=/device:CPU:0)'
Have you ever met this problem before? If you met the problem wind you mind telling me how to fix it? Thank you very much!