JGLUE icon indicating copy to clipboard operation
JGLUE copied to clipboard

The "label" column in the JSTS dataset is a string dtype

Open Katsumata420 opened this issue 2 years ago • 1 comments

Hi, thanks for publishing JGLUE.

The dtype for the JSTS label column is a string dtype. https://github.com/yahoojapan/JGLUE/blob/53e5ecd9dfa7bbe6d84f818d599bfb4393dd639d/datasets/jsts-v1.0/valid-v1.0.json#L1 Why?

I think that run_glue.py determines if a task is a regression task or not by the dtype of the label column, so if it is a string dtype, it is treated as a classification task. https://github.com/huggingface/transformers/blob/v4.9.2/examples/pytorch/text-classification/run_glue.py

In fact, fine-tuning BERT in JSTS resulted in a 26-value classification model. (I have patched run_glue.py.)

Katsumata420 avatar Jun 17 '22 20:06 Katsumata420

Thank you for reporting this issue. Yes, this is a bug. We are going to fix the JSTS label type (string -> float), and perform several experiments. Please wait the next release.

Thank you in advance.

tomohideshibata avatar Jun 18 '22 06:06 tomohideshibata