ethanc8
ethanc8
Here are the reported MBPP and MBPP+ scores:  They have not reported HumanEval+ scores, but they have reported HumanEvalSynthesize and MultiPL-E scores at https://arxiv.org/pdf/2405.04324#page=10.
They did not specify which version of MBPP+ they're using here, so they might be using v0.1.0.
They reported HumanEval, MBPP, and what looks to be the arithmetic mean of the HumanEval, HumanEval+, MBPP, and MBPP+ scores in their blog post.