BBQ icon indicating copy to clipboard operation
BBQ copied to clipboard

BBQ RoBERTa Base Reproducibility Help

Open gsgoncalves opened this issue 2 years ago • 1 comments

Hello,

Congratulations on this great work!

I am reaching out for pointers as I am unable to reproduce the accuracy results from the paper while using RoBERTa-Base.

I finetuned the RoBERTa-Base model on the RACE dataset, with the LRQA codebase. Next, I followed the instructions in the previous link to evaluate on BBQ. However, I obtained a 51.64%  average accuracy across categories, which is shy of the 61.4% reported in the paper.

I used the same parameters reported in the paper:

  • Total Batch Size: 16 (The total batch size is simulated with a batch size of 4 and a gradient accumulation of 4 steps)
  • Learning Rate: 1e-5
  • Nr Epochs: 3
  • Max Token Length: 512

I am using the libraries and respective versions in the requirements.txt file.

  • transformers==4.5.2
  • tokenizers==0.10.1
  • datasets==1.1.2

Do you have any clues as to why I am not able to obtain the same results in terms of accuracy while running the instructions of LRQA? Any pointers would be much appreciated!

Thank you! Gustavo

gsgoncalves avatar Jan 24 '23 19:01 gsgoncalves

Hi, let me take a look into this.

zphang avatar Feb 16 '23 20:02 zphang