evals
evals copied to clipboard
Code Evals
Describe the feature or improvement you're requesting
I wonder if anyone has a solid method for evaluating code benchmarks like APPS. String typed codes can be very noisy and require deliberate preprocessing to be executed and tested. I don't see any class inheriting Evals that can perform code tests.
Any clue? :thinking:
Additional context
No response