About Evaluation Scripts
Hi, I'm having difficulty submitting the results to the leaderboard, possibly due to the bug reported at https://github.com/embeddings-benchmark/mteb/issues/774.
So, I tried using "https://github.com/embeddings-benchmark/mteb/blob/main/scripts/merge_cqadupstack.py" to merge the 12 results of cqadupstack, and used https://github.com/embeddings-benchmark/mtebscripts/blob/main/results_to_csv.py to get the avarage scores for each task. Does this produce exactly the same scores as listed on the leaderboard? It seems the numbers of datasets match the ones reported in a paper (56 data sets in total).
I think it's nice to have some code/instruction for getting the final scores locally if it's just a matter of averaging scores stored in a result folder.
That script should work & correspond to the LB; I've added a simpler script here: https://github.com/embeddings-benchmark/mteb/pull/858 - would be great if you could take a look and then we can merge it if you think it's helpful :)
https://github.com/embeddings-benchmark/mteb/issues/774 does not prevent submitting to the LB, it just makes the refresh not work, but we can always restart the space to include your scores.
I assume this issue is resolved - if not feel free to re-open