mteb
mteb copied to clipboard
Evaluation of SONAR
Hello hi,
I am quite curious to see the performance of SONAR model in terms of multilingual embeddings on XSTS task. I haven't found anything on it on MTEB.
https://ai.meta.com/research/publications/sonar-sentence-level-multimodal-and-language-agnostic-representations/ https://github.com/facebookresearch/SONAR
Hey sorry for the late reply! Feel free to evaluate SONAR - I am happy to add it to the leaderboard. :)
Hi, i tried but i havent enough RAM memory, test needed more then 64 GB, so i cant perform it.
I am curious, which of the tasks needs so much memory? Is it possible to optimize it?
SONAR text encoder itself occupies only about 3GB on disk. With a small batch size, and with truncating long texts to 1024 tokens (it doesn't support more anyway), a GoogleColab-sized machine should be enough for embedding any texts with SONAR.
Clustering, STS tasks on bench. Problem not about VRAM, but RAM.
I did evaluate SONAR encoders (with a few assumptions about language codes).
However, even after adding the output of scripts/mteb_meta.py to the model's readme, I cannot see SONAR on the most of the leaderbord's tabs after hitting the refresh button (an exception is the Eng-X bitext mining tab, where SONAR is the new SOTA). And I cannot understand why.
My current suspect is that scripts/mteb_meta.py misses some of the tasks. For example, I have results for BornholmBitextMining, but it is missing in mteb_metadata.md.
To check this, I attach all the json results: sonar_results.zip. They correspond to all tasks, except of MSMARCOv2 which takes too long to embed, and the Polish retrieval tasks mentioned at https://github.com/embeddings-benchmark/mteb/issues/219.
@Muennighoff could you please help me sort this out?
Congrats!! I've fixed it here: https://github.com/embeddings-benchmark/mteb/pull/223 - Can you approve if it works for you?
Yes, thanks, it works!
Will close this issue as it seems to have become stale, though the Scandinavian embedding benchmark does evaluate the SONAR models.
edit: if someone wishes to implement sonar the new model implementation make this quite easy
SONAR is also on the leaderboard with its scores now I think
Ahh yes good catch - state-of-the-art for bitext. Good to have it there