Inquiry about Paper Details of Magicoder

Open Alex-HaochenLi opened this issue 1 year ago • 0 comments

I am very excited to read the cool work Magicoder. I strongly believe that OSS-Instruct will push the boundaries of instruction tuning for code LLMs.

I want to ask a question about Magicoder. It seems that you do not test the correctness of the generated solutions from seed code snippets. I am curious about the reason why it is not necessary to go through the code validity checking process. Below are some assumptions I made about this:

The most of generated solutions are just correct by manual checking, and LLMs are robust to some wrong codes during fine-tuning.
OSS-Instruct creates new data more like a combination of seed code snippets. And the LLMs (GPT-3.5/GPT-4) used to generate solutions can handle the combination easily since they could see correct seed code snippets.

What’s your opinion on this problem? I am looking forward to your reply and thanks for your help!

Jul 22 '24 02:07 Alex-HaochenLi