mteb
mteb copied to clipboard
TwitterSemEval2015 combines train, dev, and test
It looks like the TwitterSemEval2015 test data combines the train, dev, and test data from the original task. Was this intentional? My assumption is that some of the models would have trained on that data
Hey, thanks for bringing up this issue, I checked and it is indeed the case that TwitterSemEval2015 combines the train, dev, and test data from the original task. While this was not intentional, we have decided to leave it as is for now. It's true that some models may have trained on parts of the data, however the SemEval test set is relatively small with just 972 pairs with an average length of 36 chars, so using the full dataset provides a better score estimate, which is the spirit of our massive benchmark, so we'll keep this trade-off for now. Hope that clears things up