ModernBERT icon indicating copy to clipboard operation
ModernBERT copied to clipboard

Question about MLDR Evaluation Metrics in ModernBERT Paper

Open WoutDeRijck opened this issue 10 months ago • 1 comments

Hi, I'm working with the MLDR dataset and trying to reproduce the results from the ModernBERT paper. In Table 3, they report an MLDR-EN score of 44.0 for their model, but I'm getting different metrics (for MLDRO_OD):

MRR@10: 0.746 NDCG@10: 0.781 Accuracy@1: 0.670 MAP@10: 0.746 This is after training on MS MARCO and evaluating on MLDR-EN dev set. I'm using the InformationRetrievalEvaluator from sentence-transformers.

Could someone clarify:

Which metric was used for the 44.0 score in the paper? Is there a specific evaluation setup I should be using for MLDR? Thanks in advance!

WoutDeRijck avatar Jan 31 '25 16:01 WoutDeRijck

Hey, I'm so sorry this took us ages to actually get to, tons of plates spinning!

The metric we used is NDCG@10. I'm going to double check our scripts, however, I'm wondering if there might be something wrong with your eval scripts. As a sanity check, I checked common MLDR results, such as the ones reported by BGE-M3:

Image

It looks like a score of ~44 with very moderate training seems more in line with what we'd expect, while 0.781 would make it almost state-of-the-art and better than all dense, specifically trained embedding models!

bclavie avatar May 28 '25 03:05 bclavie