lm-evaluation-harness
lm-evaluation-harness copied to clipboard
Output constrained support
I'd like to build a benchmark and I'd like to set up some rules based on which I can constrain the responses the LLM provides - then eval with pre-determined code (taking into consideration the predetermined constrained rules) the response.
I'm thinking at integrations with libraries like outlines or guidance.