mteb icon indicating copy to clipboard operation
mteb copied to clipboard

Making ScalaClassification multilingual

Open dokato opened this issue 10 months ago • 5 comments

Hey @KennethEnevoldsen, it seems like you added ScalaDaClassification, right? I was wondering if there's a reason why it's listed as a multilingual (i.e., some more langs support planned) or is that a glitch?

Happy to offer a quick fix that if needed.

dokato avatar Apr 26 '24 09:04 dokato

Hello, in the file there is more than one task using different versions of this dataset with different languages. A better formulation would be to make one ScalaClassification task that is a MultilingualTask and create one dataset repository with config_names containing the language as it has been done in PR https://github.com/embeddings-benchmark/mteb/pull/575

imenelydiaker avatar Apr 26 '24 11:04 imenelydiaker

Ah, sorry I missed that cause in my editor it pulls up a single class/function without entire module. But yeah you are right that the way in PR makes it more explicit. I'll change a title to more appropriate and leave it open to pick up, but I guess it's not priority atm.

dokato avatar Apr 26 '24 12:04 dokato

@dokato you can open a PR if you feel so, and also ask to join the hugging face organization to be able to create a repository on mteb and upload the data.

imenelydiaker avatar Apr 26 '24 20:04 imenelydiaker

Ok, I’ll give it a go

dokato avatar Apr 26 '24 22:04 dokato

Yea no reason why they are monolingual, would love to see a PR on this

KennethEnevoldsen avatar Apr 29 '24 08:04 KennethEnevoldsen