MiniCPM
MiniCPM copied to clipboard
minicpm-2b的gsm8k复现结果与论文报告的差异
您好,我在使用该框架测试gsm8k时遇到了复现不一致的问题。 用minicpm-2b-sft-bf16模型在gsm8k任务上测的值只有38.13,
"overall_result": {
"accuracy": 0.3813495072024261
}
使用的配置参数如下:
{
"task_name": "gsm8k_gsm8k_gen",
"path": "datasets/gsm8k/data/gsm8k.jsonl",
"description": "",
"transform": "datasets/gsm8k/transform_gen_v0.py",
"fewshot": 8,
"batch_size": 1,
"generate": {
"method": "generate",
"params": "models/model_params/vllm_sample_v1.json",
"args": {
"temperature": 0.1,
"top_p": 0.95,
"max_tokens": 300,
"sampling_num": 1
}
},
"model_postprocess": "general_torch",
"task_postprocess": "gsm8k_post",
"metric": {
"accuracy": {
"evaluation": {
"type": "exact_match"
}
}
},
"log_dir": "logs/2024-06-14_11-07-57"
}
minicpm论文里报告的gsm8k值为53.83,并且也是使用UltraEval框架完成的测试,为什么会相差这么大呢?是否是解码时超参数的设置不一样呢?感谢您的回复。