How to reproduce the results in the paper.
Hello, I have downloaded the released model and followed the inference command you provided.
However, it seems that the strict accuracy is not matched with the number you reported in the paper.
My inference command is:
python src/inference.py JingyaoLi/MoTCoder-15B-v1.0/ apps/test.jsonl ./output/generation.jsonl FORMAT_PROMPT
After evaluation, the accuracy on the competition level is:
Could you please help me do inference correctly?
Hi, our reported pass@1 is the average/normalized pass@1. You can refer to this benchmark paper for the detailed metric definition.
Thanks for your reply!
I noticed that the pass@1 and pass@5 of GPT-Neo (Tab.4 in your paper) are strict accuracies.
I believe it would be better to report the numbers using the consistent metric in Tab.4.
Thank you for bringing this to our attention. We have verified that what you mentioned is correct. Due to an error in reporting in line with previous work, we have inaccuracies in the performance metrics of competitive methods in the paper. We will rectify our mistake as soon as possible.