DiffCSE
DiffCSE copied to clipboard
Command to replicate transfer results
CUDA_VISIBLE_DEVICES=6 python train.py --model_name_or_path bert-base-uncased --generator_name distilbert-base-uncased --train_file data/nli_for_simcse.csv --num_train_epochs 2 --per_device_train_batch_size 64 --learning_rate 2e-6 --max_seq_length 32 --evaluation_strategy steps --metric_for_best_model stsb_spearman --load_best_model_at_end --eval_steps 125 --pooler_type cls --overwrite_output_dir --logging_first_step --logging_dir trained --temp 0.05 --do_train --do_eval --batchnorm --lambda_weight 0.05 --fp16 --masking_ratio 0.15 --output_dir trained_orig
Cannot replicate the results for transfer task after training the model on the params above (note lambda, lr, batch size values) are taken from the appendix of the paper.
Results got :
Eval results *****
epoch = 2.0
eval_CR = 88.03
eval_MPQA = 88.01
eval_MR = 80.52
eval_MRPC = 73.97
eval_SST2 = 85.32
eval_SUBJ = 93.48
eval_TREC = 77.07
eval_avg_sts = 0.8235647840844718
eval_avg_transfer = 83.77142857142859
eval_sickr_spearman = 0.8026336149884777
eval_stsb_spearman = 0.8444959531804657
Can you please help with right params to train the model from scratch in supervised setting?