CodeRL icon indicating copy to clipboard operation
CodeRL copied to clipboard

Performance Results on HumanEval

Open htcml opened this issue 2 years ago • 1 comments

I am reading your CodeRL paper. It uses the APPS benchmark to show the performance comparison with Codex. Do you have any comparison results using the HumanEval dataset?

htcml avatar Feb 17 '23 01:02 htcml

@htcml thanks for reading the paper.

In our case, HumanEval dataset would not be the best evaluation benchmark. The reason is that HumanEval is treated as a docstring to code task in which the function signature and its docstring (in code comment block) is given. It is ideal for zero-shot evaluation for larger LMs such as CodeGen and Codex.

In our paper, we focus more on natural language text description of a problem and generate a program from scratch.

One workaround is that we can reformulate the HumanEval as text-to-code tasks but the comparison might not be fair with current baselines.

henryhungle avatar Feb 22 '23 08:02 henryhungle