adapter-bert
adapter-bert copied to clipboard
Adapters on large-datasets in GLUE could not get the same results
Hi I am trying adapters on Bert-base. I am evaluating on GLUE. On smaller datasets like MRPC, RTE, COLA, I see good results, but on large datasets of GLUE like MNLI, QNLI, SST2 I am really struggling and this is getting very below BERT-base.
I have a deadline soon and need to compare fairly with your method, and very much appreciate your feedback on this. Any suggestions which can help the results on large-scale datasets?
thanks
What hyperparameters are you using? Did you follow the sweep in the paper?
"We sweep learning rates in {3 · 10−5, 3 · 10−4, 3 · 10−3}, and number of epochs in {3, 20}"