FlagEmbedding
FlagEmbedding copied to clipboard
When evaluating the BGE-Code-v1 model using the CoIR dataset, why is the result in the Apps section so poor?
When evaluating the BGE-Code-v1 model using the CoIR dataset, why is the result in the Apps section so poor, only around 20? Below are the main configurations. The results are the same whether using the official CoIR library or the evalscope library. Is there anything wrong?
from evalscope.run import run_task
one_stage_task_cfg = {
"work_dir": "outputs",
"eval_backend": "RAGEval",
"eval_config": {
"tool": "MTEB",
"model": [
{
"model_name_or_path": "bge-code-v1",
"pooling_mode": "lasttoken",
"max_seq_length": 512,
"prompt": "<instruct>Given a code contest problem description, retrieve relevant code that can help solve the problem.\n<query>",
"model_kwargs": {"torch_dtype": "auto"},
"encode_kwargs": {
"batch_size": 128,
},
}
],
"eval": {
"tasks": [
"AppsRetrieval",
],
"verbosity": 2,
"overwrite_results": True,
"top_k": 10,
},
},
}
run_task(task_cfg=one_stage_task_cfg)
We haven't used evalscope for evaluation. We used the official code from the CoIR GitHub repository for evaluation. For details, you can refer to: https://github.com/FlagOpen/FlagEmbedding/tree/master/research/BGE_Coder#coir