jiant icon indicating copy to clipboard operation
jiant copied to clipboard

Swapped labels in CommitmentBank

Open davda54 opened this issue 3 years ago • 1 comments

Describe the bug The downloaded files for the CB task have swapped labels. This introduces a nasty silent bug because all the metrics calculated using these data seem correct but the fine-tuned model is actually predicting nonsense.

contradiction is mapped to entailment entailment to neutral neutral to contradiction

To Reproduce I used Jiant 2.2.0 available via pip3 install jiant.

import json
from collections import Counter
import jiant.scripts.download_data.runscript as downloader

# Download Jiant CB
downloader.download_data(["cb"], "tasks")
with open("tasks/data/cb/train.jsonl") as f:
    jiant_freqs = Counter([json.loads(line)["label"] for line in f.readlines()])
print(jiant_freqs.most_common())

>>> [('entailment', 119), ('neutral', 115), ('contradiction', 16)]

# Download official CB from SuperGLUE
!wget "https://dl.fbaipublicfiles.com/glue/superglue/data/v2/CB.zip"
!unzip CB.zip
with open("CB/train.jsonl") as f:
    official_freqs = Counter([json.loads(line)["label"] for line in f.readlines()])
print(official_freqs.most_common())

>>> [('contradiction', 119), ('entailment', 115), ('neutral', 16)]

davda54 avatar Feb 03 '22 15:02 davda54

Thanks! #1347

zphang avatar Feb 03 '22 21:02 zphang