TextAttack
TextAttack copied to clipboard
Wrong label mapping for released model textattack/bert-base-uncased-MNLI
Describe the bug When I am using the released fine-tuned model checkpoint textattack/bert-base-uncased-MNLI to do evaluation on MNLI task with the Huggingface transformers texxt-classification example scripts, I can only get 7% accuracy, rather than the reported 84% accuracy. I investigate the error and find that is because there is a significant label mapping error. The output-GT label mapping should be: 0 (model) -> 2 (GT), 1 (model) -> 0 (GT), 2 (model) -> 1 (GT). With this mapping, I can successfully retain 84% accuracy.
To Reproduce
In Huggingface transformers texxt-classification example scripts, you can get the running snippets using run_glue.py in README.md. Simply replace the original bert-base-cased with textattack/bert-base-uncased-MNLI and remove --do_train. You can get the results.
Expected behavior By running the above steps, after the evaluation is done, you should be able to see the evaluation accuracy is <10%.
Screenshots or Traceback This is not a running bug, and due to some legal policies, I cannot show the running results in clusters. But it should be fairly easy to replicate.
System Information (please complete the following information):
I use A100, so I use torch==1.9.0+cu111. I use transformers version 4.21 and remove check_min_version from run_glue.py. But I think this operation is not related to the bug.
Additional context Add any other context about the problem here.
Just to provide a bit more help, I have updated the label mapping and uploaded the fixed checkpoint here: https://huggingface.co/chromeNLP/textattack_bert_base_MNLI_fixed.