human-eval pass@k on filtered samples

pass@k on filtered samples

Open henryhungle opened this issue 3 years ago • 0 comments

Hi,

Thank you for the great work!

I have 2 questions about the computation of the pass@k metric after applying filtering on the APPS benchmark.

Will the total array in the below code snippet contain numbers of filtered samples that passed the example test cases (from problem statement), i.e. each number <= N_original_samples(=1000)? https://github.com/openai/human-eval/blob/312c5e5532f0e0470bf47f77a6243e02a61da530/human_eval/evaluation.py#L85
In the cases when a number of filtered samples is less than k (=[1,5]), how do you compute the pass@k metric for these cases? For example, when N_filtered_samples = 1 and k=5, can we assume execution results of 4 failures and 1 passed/failure (depending on the final unit test results of this filtered sample)?

Feb 17 '22 07:02 henryhungle