Baseline issue
When re-producing the experiment of pretraining on mag240m and evaluating on arxiv, we found that the Contrastive baseline results in the similar performance as Prodigy when the aux loss is applied (using -attr 1000) and the training step is increased to 50,010.
The complete training command we used is
python run_single_experiment.py --dataset mag240m --root /datasets --original_features True --input_dim 768 --emb_dim 256 -ds_cap 50010 -val_cap 100 -test_cap 100 --epochs 1 -ckpt_step 1000 -layers S2,U,A -lr 5e-4 -way 30 -shot 3 -qry 4 -eval_step 500 -task cls_nm_sb -bs 1 -aug ND0.5,NZ0.5 -aug_test True -attr 1000 --device 0 --prefix MAG_Contrastive
The evaluation command is
python run_single_experiment.py --dataset arxiv --root /datasets --emb_dim 256 --input_dim 768 -ds_cap 510 -val_cap 510 -test_cap 500 -eval_step 100 -epochs 1 --layers S2,U,A -way 3 -shot 3 -qry 3 -lr 1e-5 -bert roberta-base-nli-stsb-mean-tokens -pretrained state_dict_49000.ckpt --eval_only True --train_cap 10 --device 0
The confusing results on the test accuracy over arxiv dataset are
| way | Contrastive | Prodigy |
|---|---|---|
| 3 | 74.92 | 73.09 |
| 5 | 63.81 | 61.52 |
| 10 | 49.77 | 46.74 |
| 20 | 37.62 | 34.41 |
| 40 | 27.85 | 25.13 |
The checkpoint of the contrastive model we obtained when pretraining on mag240m is attached. Could you please clarify if there is anything wrong in the setting of our experiments? Thank you!
It is quite wired. I encountered similar problem here.
@q-hwang Hi, can you help clarify the result we got?