mteb Evaluation of SONAR

Hello hi,

I am quite curious to see the performance of SONAR model in terms of multilingual embeddings on XSTS task. I haven't found anything on it on MTEB.

https://ai.meta.com/research/publications/sonar-sentence-level-multimodal-and-language-agnostic-representations/ https://github.com/facebookresearch/SONAR

Aug 25 '23 09:08 gulldan

Hey sorry for the late reply! Feel free to evaluate SONAR - I am happy to add it to the leaderboard. :)

Nov 16 '23 16:11 Muennighoff

Hi, i tried but i havent enough RAM memory, test needed more then 64 GB, so i cant perform it.

Nov 17 '23 06:11 gulldan

I am curious, which of the tasks needs so much memory? Is it possible to optimize it?

SONAR text encoder itself occupies only about 3GB on disk. With a small batch size, and with truncating long texts to 1024 tokens (it doesn't support more anyway), a GoogleColab-sized machine should be enough for embedding any texts with SONAR.

Jan 24 '24 20:01 avidale

Clustering, STS tasks on bench. Problem not about VRAM, but RAM.

Jan 25 '24 16:01 gulldan

I did evaluate SONAR encoders (with a few assumptions about language codes).

However, even after adding the output of scripts/mteb_meta.py to the model's readme, I cannot see SONAR on the most of the leaderbord's tabs after hitting the refresh button (an exception is the Eng-X bitext mining tab, where SONAR is the new SOTA). And I cannot understand why.

My current suspect is that scripts/mteb_meta.py misses some of the tasks. For example, I have results for BornholmBitextMining, but it is missing in mteb_metadata.md.

To check this, I attach all the json results: sonar_results.zip. They correspond to all tasks, except of MSMARCOv2 which takes too long to embed, and the Polish retrieval tasks mentioned at https://github.com/embeddings-benchmark/mteb/issues/219.

@Muennighoff could you please help me sort this out?

Feb 05 '24 14:02 avidale

Congrats!! I've fixed it here: https://github.com/embeddings-benchmark/mteb/pull/223 - Can you approve if it works for you?

Feb 05 '24 16:02 Muennighoff

Yes, thanks, it works!

Feb 06 '24 10:02 avidale

Will close this issue as it seems to have become stale, though the Scandinavian embedding benchmark does evaluate the SONAR models.

edit: if someone wishes to implement sonar the new model implementation make this quite easy

Sep 09 '24 15:09 KennethEnevoldsen

SONAR is also on the leaderboard with its scores now I think

Sep 09 '24 17:09 Muennighoff

Ahh yes good catch - state-of-the-art for bitext. Good to have it there

Sep 09 '24 19:09 KennethEnevoldsen

mteb mteb copied to clipboard

Evaluation of SONAR

mteb
mteb copied to clipboard