Add SWIM-IR
Google released a new crosslingual retrieval dataset: https://huggingface.co/datasets/nthakur/swim-ir-cross-lingual
We could turn a subset of this into a retrieval and reranking benchmark.
If no one picks this up, I can take at look at this during the weekend.
Amazing. Feel free to open a PR :)
That'd be great indeed cc @thakur-nandan
Thanks @Muennighoff. The SWIM-IR dataset would be great and contains training splits only as it should be used for training. If that would be desirable we can go ahead and add it into MTEB.
Let me know if you need help @rasdani.
Thanks, Nandan
Thanks @Muennighoff. The SWIM-IR dataset would be great and contains training splits only as it should be used for training. If that would be desirable we can go ahead and add it into MTEB.
Let me know if you need help @rasdani.
Thanks, Nandan
Oh does it still make sense to use it for evaluation or not at all? Not sure if adding a training dataset makes sense cc @KennethEnevoldsen
I wouldn't add a dataset intended for training unless we expect it to evaluate an aspect which we are currently not evaluating.