mteb icon indicating copy to clipboard operation
mteb copied to clipboard

Results on `BRIGHT` not matching

Open Samoed opened this issue 3 months ago • 4 comments

I ran the model on the BRIGHT benchmark using the following code:

import torch
import mteb

prompts_dict = {
    "BrightRetrieval": "Given a Post, retrieve relevant passages that help answer the post",
}

tasks = mteb.get_tasks(tasks=["BrightRetrieval"])
evaluation = mteb.MTEB(tasks=tasks)

model = mteb.get_model(
    "ReasonIR/ReasonIR-8B",
    model_kwargs={"torch_dtype": torch.bfloat16},
    prompts_dict=prompts_dict,
)

evaluation.run(
    model,
    save_predictions=True,
    output_folder="results",
    encode_kwargs={"batch_size": 1},
)

The results are as follows:

  Bio. Earth. Econ. Psy. Rob. Stack. Sus. Leet. Pony AoPS TheoQ. TheoT. Avg.
ReasonIR 24.31 30.83 24.27 28.95 18.40 21.68 20.57 18.14 9.49 4.84 18.21 26.42 20.51

In the paper: image

Originally posted by @whybe-choi in https://github.com/embeddings-benchmark/mteb/issues/3221#issuecomment-3355490399

Possible solution will be to create different tasks per subset.

Samoed avatar Oct 06 '25 17:10 Samoed

Hello, @Samoed ! I'd like to help resolve this issue if possible. Is it okay for me to start working on it? Any advice or guidance you could provide would be greatly appreciated.

whybe-choi avatar Oct 07 '25 05:10 whybe-choi

Hi! Yes, that would be great! For this task, you should create different tasks for each subset

Samoed avatar Oct 07 '25 08:10 Samoed

In addition to splitting by task, it would be a good idea to separate the long-document related parts into a separate file called BrightLongRetrieval.py. What do you think about this approach?

whybe-choi avatar Oct 07 '25 08:10 whybe-choi

Yes, I think this is good

Samoed avatar Oct 07 '25 09:10 Samoed