apps
apps copied to clipboard
Nan test case average
Hello, I am trying to evaluate my model's generated codes using scripts in eval. However, for a particular problem, results[index] turns out to be an empty array as a result of which calculating mean in print_results() gives nan. How should I handle this case?
Curious what your results array looks like or at least the relevant portion.
Here's the example results that the code gives us: https://github.com/hendrycks/apps/blob/main/eval/test_one_solution.py#L19
This is my results variable for a portion of the problems: It has an empty array for 4534
{4530: [[False, False, True]], 4531: [[False, True, False]], 4532: [[False, True, False]], 4533: [[-2]], 4534: [[]], 4535: [[-2]], 4536: [[-2]], 4537: [[-1, -1, -1]]}
Since I'm not sure how that was generated the easiest thing would be to post process your results and just convert any of the [[]]
to [[-2]]
.
Any updates on this issue? Otherwise I'll close it soon.
I just did what you suggested, no more updates.