oso
oso copied to clipboard
Document how we organize datasets for automatic evals
What is it?
We should have a document on how datasets are discovered for automatic eval execution
Current plan:
- Each of the evals should have a
frequencyvalue. The values should be something likecron,on-deployments. - If the frequency is set to
cron, then acronvalue should be set. - Each of the datasets should have tags in the style
eval:NAME_OF_EVALwhere the value is a boolean if that specific eval should be enabled
At the very least, we want to be able to specify metadata filters, for example:
!run_eval text2sql where the priority is high or something like that