HEBO MBPP Dataset Preprocessing for Code-Optimize

Thank you for your promising work Code-Optimize. We greatly appreciate the effort you've put into it.

However, we’ve encountered some difficulty in reproducing the second step of annotation. Specifically, the MBPP dataset provided in the code contains the following keys: [prompt, test, entry_point].

On the other hand, the MBPP datasets we have found online typically contain keys such as: ['task_id', 'text', 'code', 'test_list', 'test_setup_code'].

It seems there might be a missing or unclear preprocessing step that is causing this discrepancy. Could you kindly clarify this step for us, or point us in the right direction?

Looking forward to your response, and thank you once again for your valuable contributions.

Best regards,

Feb 11 '25 14:02 JieWu02

Hello, and thank you for taking interest in our work! 👍 Sorry about this, I forgot to upload our slightly formatted MBPP version. This will be uploaded into the 'datasets' folder very soon. I hope this helps.

By the way, you can also use the finished datasets, which are in the 'datasets' folder. Skip straight to the optimization step 🥇 Have a nice day!

Feb 11 '25 17:02 milangritta

I see the updated MBPP data, great! Have a nice day too!

Feb 11 '25 18:02 JieWu02