Steven Basart
Steven Basart
The evaluations didn't need any GPUs and could run on CPUs so we optimized to run the evaluations in parallel across many CPUs across many nodes with our data. That...
I'll see if I can look at the code it generated to figure out what's causing that. Honestly I think it's probably best to do something more sophisticated though and...
@loubnabnl Do you know how I can run the code from https://huggingface.co/spaces/codeparrot/apps_metric/tree/main ? Not sure how to download it. I'm not sure if the code from their has diverged at...
Just to be clear we're not interested in debugging the OOM issue or we are? The following just evaluating the second example on the second test problem will give me...
Thanks for the link! I'll rerun the test again with a much smaller memory limit. Hopefully that'll fix the issue.
I don't know if you've been able to run it or fix it. Mine gets OOM killed on example 20 when running locally. I imagine the HF servers have more...
If after your testing you could do a PR please we'd be happy to accept it. :)
Curious what your results array looks like or at least the relevant portion. Here's the example results that the code gives us: https://github.com/hendrycks/apps/blob/main/eval/test_one_solution.py#L19
Since I'm not sure how that was generated the easiest thing would be to post process your results and just convert any of the `[[]]` to `[[-2]]`.
Any updates on this issue? Otherwise I'll close it soon.