JGLUE
JGLUE copied to clipboard
The "label" column in the JSTS dataset is a string dtype
Hi, thanks for publishing JGLUE.
The dtype for the JSTS label column is a string dtype. https://github.com/yahoojapan/JGLUE/blob/53e5ecd9dfa7bbe6d84f818d599bfb4393dd639d/datasets/jsts-v1.0/valid-v1.0.json#L1 Why?
I think that run_glue.py
determines if a task is a regression task or not by the dtype of the label column, so if it is a string dtype, it is treated as a classification task.
https://github.com/huggingface/transformers/blob/v4.9.2/examples/pytorch/text-classification/run_glue.py
In fact, fine-tuning BERT in JSTS resulted in a 26-value classification model. (I have patched run_glue.py.)
Thank you for reporting this issue. Yes, this is a bug. We are going to fix the JSTS label type (string -> float), and perform several experiments. Please wait the next release.
Thank you in advance.