mteb icon indicating copy to clipboard operation
mteb copied to clipboard

Split Matryoshka model results where applicable

Open raffaeler opened this issue 1 year ago • 4 comments

With regards to the Matryoshka embedding models, I would love to see different lines (for the same Matryoshka model) for each vector length. It would also be valuable to tag the Matryoshka models in a separate column with all the available lengths.

raffaeler avatar Apr 03 '24 16:04 raffaeler

Thanks @raffaeler, can you outline how you imagine the table might look like?

KennethEnevoldsen avatar Apr 03 '24 17:04 KennethEnevoldsen

Thanks for the prompt answer @KennethEnevoldsen.

Given that MTEB is a leaderboard, I believe that each length should be in a separate line as it was a different model. Anyway, since a single model have different lengths, the Embedding Dimensions column should contain all the vector lengths, with the one being measured in bold.

This is just an idea, but I don´t believe it is possible aggregating the results for all the model lengths for the same model in a single line, otherwise the other column values should contain multiple values which is confusing.

raffaeler avatar Apr 03 '24 17:04 raffaeler

@KennethEnevoldsen I would also add a column telling whether the model is multimodal or not.

This is not related to Matryoshka, please let me know if you want me to open a separate issue.

raffaeler avatar Apr 09 '24 15:04 raffaeler

We already have embedding size on the benchmark and people could add the same model twice (e.g. as MyModel (emb_size=512))

Please add a new issue on the multimodal. In that issue also specify why it is important.

Generally, we should probably create a more detailed model metadata. While the dashboard can't accommodate for everything it should be easy to compare models on relevant tasks. This already discussed on #314.

KennethEnevoldsen avatar Apr 09 '24 16:04 KennethEnevoldsen

Closing this based on the response above.

isaac-chung avatar Aug 31 '24 07:08 isaac-chung