Steven Basart comments

Results 29 comments of


                                            Steven Basart

evaluation on multiple solutions at once causes memory leak

The evaluations didn't need any GPUs and could run on CPUs so we optimized to run the evaluations in parallel across many CPUs across many nodes with our data. That...

evaluation on multiple solutions at once causes memory leak

I'll see if I can look at the code it generated to figure out what's causing that. Honestly I think it's probably best to do something more sophisticated though and...

evaluation on multiple solutions at once causes memory leak

@loubnabnl Do you know how I can run the code from https://huggingface.co/spaces/codeparrot/apps_metric/tree/main ? Not sure how to download it. I'm not sure if the code from their has diverged at...

evaluation on multiple solutions at once causes memory leak

Just to be clear we're not interested in debugging the OOM issue or we are? The following just evaluating the second example on the second test problem will give me...

evaluation on multiple solutions at once causes memory leak

Thanks for the link! I'll rerun the test again with a much smaller memory limit. Hopefully that'll fix the issue.

evaluation on multiple solutions at once causes memory leak

I don't know if you've been able to run it or fix it. Mine gets OOM killed on example 20 when running locally. I imagine the HF servers have more...

evaluation on multiple solutions at once causes memory leak

If after your testing you could do a PR please we'd be happy to accept it. :)

Nan test case average

Curious what your results array looks like or at least the relevant portion. Here's the example results that the code gives us: https://github.com/hendrycks/apps/blob/main/eval/test_one_solution.py#L19

Nan test case average

Since I'm not sure how that was generated the easiest thing would be to post process your results and just convert any of the `[[]]` to `[[-2]]`.

Nan test case average

Any updates on this issue? Otherwise I'll close it soon.