jiant
jiant copied to clipboard
Swapped labels in CommitmentBank
Describe the bug The downloaded files for the CB task have swapped labels. This introduces a nasty silent bug because all the metrics calculated using these data seem correct but the fine-tuned model is actually predicting nonsense.
contradiction is mapped to entailment
entailment to neutral
neutral to contradiction
To Reproduce
I used Jiant 2.2.0 available via pip3 install jiant.
import json
from collections import Counter
import jiant.scripts.download_data.runscript as downloader
# Download Jiant CB
downloader.download_data(["cb"], "tasks")
with open("tasks/data/cb/train.jsonl") as f:
jiant_freqs = Counter([json.loads(line)["label"] for line in f.readlines()])
print(jiant_freqs.most_common())
>>> [('entailment', 119), ('neutral', 115), ('contradiction', 16)]
# Download official CB from SuperGLUE
!wget "https://dl.fbaipublicfiles.com/glue/superglue/data/v2/CB.zip"
!unzip CB.zip
with open("CB/train.jsonl") as f:
official_freqs = Counter([json.loads(line)["label"] for line in f.readlines()])
print(official_freqs.most_common())
>>> [('contradiction', 119), ('entailment', 115), ('neutral', 16)]
Thanks! #1347