oso
oso copied to clipboard
Attach expected SQL queries to evals
What is it?
Add expected SQL queries associated with each eval. This will be crucial with building tools to actually evaluate and keep track of the accuracy of the model.
I'm thinking I can create some sort of sandbox testing environment as well, so we can easily hook up any model to our evals and view results. For example, I could run gemini's new DS agent through it as a way to evaluate it: https://github.com/opensource-observer/oso/issues/3673
I suggest we find a way to build this into Phoenix, rather than something bespoke. I'd sync with @ravenac95 on this
Closed here: https://github.com/opensource-observer/oso/pull/4013/