MoTCoder How to reproduce the results in the paper.

Hello, I have downloaded the released model and followed the inference command you provided.

However, it seems that the strict accuracy is not matched with the number you reported in the paper.

My inference command is:

python  src/inference.py JingyaoLi/MoTCoder-15B-v1.0/ apps/test.jsonl ./output/generation.jsonl FORMAT_PROMPT

After evaluation, the accuracy on the competition level is:

img_v3_026k_87a2bf54-f748-4082-8fef-13683fe91ddg

Could you please help me do inference correctly？

Jan 02 '24 07:01 xssstory

Hi, our reported pass@1 is the average/normalized pass@1. You can refer to this benchmark paper for the detailed metric definition.

Jan 03 '24 09:01 JulietLJY

Thanks for your reply!

I noticed that the pass@1 and pass@5 of GPT-Neo (Tab.4 in your paper) are strict accuracies.

I believe it would be better to report the numbers using the consistent metric in Tab.4.

Jan 05 '24 08:01 xssstory

Thank you for bringing this to our attention. We have verified that what you mentioned is correct. Due to an error in reporting in line with previous work, we have inaccuracies in the performance metrics of competitive methods in the paper. We will rectify our mistake as soon as possible.

Jan 05 '24 09:01 JulietLJY