human-eval Why pass@k =1.0? use the "evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example

Why pass@k =1.0? use the "evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl"

Open Smithol opened this issue 2 years ago • 3 comments

$ evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl Reading samples... 6it [00:00, 7047.28it/s] Running test suites... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 98.99it/s] Writing results to data/example_samples.jsonl_results.jsonl... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 21826.39it/s] {'pass@1': 1.0}

Nov 03 '22 13:11 Smithol

It looks like you omitted exec line @Smithol

Apr 11 '23 15:04 SeungyounShin

Do you know what is wrong with it?

Oct 28 '23 03:10 laoniandisko

$ evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl Reading samples... 6it [00:00, 7047.28it/s] Running test suites... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 98.99it/s] Writing results to data/example_samples.jsonl_results.jsonl... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 21826.39it/s] {'pass@1': 1.0}

After uncomment execution.py --->line 58--->exec() function I get 0.5.

Jan 29 '24 06:01 tusiqi1

human-eval human-eval copied to clipboard

Why pass@k =1.0? use the "evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl"

human-eval
human-eval copied to clipboard