Create an eval from traces
Add new features that make simpler to create evals by using traces as the dataset. Example:
Why this matters
- Evals are important to get the most out of AI systems but are incredibly time consuming and frustrating to create
- It can be confusing on what shape the data needs to be in for different eval types
Key scenarios this enables
- Detecting prompt regressions in agents
- Bulk model and prompt experimentation
MVP
- The
InputandOutputcan be automatically mapped to eval variables{{query}}and{{response}}. This is an assumption to streamline the experience. The same may be true for mapping totool_definitionsandtool_calls. - Start with options that easily map to trace data like Relevance, Task Adherence, Coherence, Similarity, Intent Resolution, and Tool Call Accuracy.
Thank you for contacting us! Any issue or feedback from you is quite important to us. We will do our best to fully respond to your issue as soon as possible. Sometimes additional investigations may be needed, we will usually get back to you within 2 days by adding comments to this issue. Please stay tuned.
Prototype:
https://github.com/user-attachments/assets/0e3a55cc-ba1a-4881-99b3-1bc8ebc03d96
Added to AI Toolkit project backlog and we will plan this post Ignite.