mteb TwitterSemEval2015 combines train, dev, and test

TwitterSemEval2015 combines train, dev, and test

Open ahoho opened this issue 2 years ago • 1 comments

It looks like the TwitterSemEval2015 test data combines the train, dev, and test data from the original task. Was this intentional? My assumption is that some of the models would have trained on that data

Mar 06 '23 23:03 ahoho

Hey, thanks for bringing up this issue, I checked and it is indeed the case that TwitterSemEval2015 combines the train, dev, and test data from the original task. While this was not intentional, we have decided to leave it as is for now. It's true that some models may have trained on parts of the data, however the SemEval test set is relatively small with just 972 pairs with an average length of 36 chars, so using the full dataset provides a better score estimate, which is the spirit of our massive benchmark, so we'll keep this trade-off for now. Hope that clears things up

Mar 09 '23 15:03 loicmagne

mteb mteb copied to clipboard

TwitterSemEval2015 combines train, dev, and test

mteb
mteb copied to clipboard