opencompass [Feature] Improve evaluation scripts for mbpp datasets

Describe the feature

When I evaluated the vicuna-7b-v1.5 model using the mbpp_gen script, the score was 0 and most answers showed failed. Perhaps the evaluate script did not properly format the answer. 微信截图_20240228213314 微信图片_20240228213255

Will you implement it?

[ ] I would like to implement this feature and create a PR!

Feb 28 '24 13:02 yuhui1038

I met the same error.Here's the result I got on MBPP datasets when evaluating gemma-7b-it. By the way, I want to know where to find the prediction result as you paste.Thank you for your report.

Mar 07 '24 10:03 YFCYFC

I met the same error.Here's the result I got on MBPP datasets when evaluating gemma-7b-it. By the way, I want to know where to find the prediction result as you paste.Thank you for your report.

I find the prediction file, and each prediction is empty. 截屏2024-03-07 下午6 46 09 I don't know what happened here, because I used the default config for the model and the dataset.Looking forward to helpful findings.

Mar 07 '24 10:03 YFCYFC

I met the same error.Here's the result I got on MBPP datasets when evaluating gemma-7b-it. By the way, I want to know where to find the prediction result as you paste.Thank you for your report.

I find the prediction file, and each prediction is empty. I don't know what happened here, because I used the default config for the model and the dataset.Looking forward to helpful findings.

same question when testing bbh_gen task, the prediction is empty. Have you fixed it? @YFCYFC

Mar 26 '24 10:03 iFe1er

I met the same error.Here's the result I got on MBPP datasets when evaluating gemma-7b-it. By the way, I want to know where to find the prediction result as you paste.Thank you for your report.

I find the prediction file, and each prediction is empty. I don't know what happened here, because I used the default config for the model and the dataset.Looking forward to helpful findings.

same question when testing bbh_gen task, the prediction is empty. Have you fixed it? @YFCYFC

Sorry,I did not found the reason exactly.I just fine tuned my model once again, and the predictions were not empty, but still not the desired result, so I didn't post the update here.I suggest you finetune your model for more than 1 epochs, which may help.

Mar 26 '24 10:03 YFCYFC

opencompass opencompass copied to clipboard

[Feature] Improve evaluation scripts for mbpp datasets

Describe the feature

Will you implement it?

opencompass
opencompass copied to clipboard