Devise a scalable workflow for adding new evals
What is it?
Per this "MVP" version here, we need to revisit the categorization and design some sort of workflow that allows this system to:
- easily scale to hundreds of evals
- offers a way to define new categories/edge cases to be added (as we uncover new ones) with minimal overlap
- plugs into other evals infrastructure (Arize Phoenix)
@ryscheng do you have feedback on this?
Ya I like the idea of moving this into code so that we can version and abstract as necessary. Let me circle back when I have an MVP dataset uploader together
@evanameyer1 lmk if you need more feedback. Feel free to dump the canonical workflow into this issue and close it out when you're ready
@ryscheng Sounds good, I'll get to this tomorrow!
Readme documenting this here: https://github.com/opensource-observer/oso/tree/main/warehouse/oso_agent/oso_agent/datasets/readme.md