vscode-ai-toolkit Create an eval from traces

Add new features that make simpler to create evals by using traces as the dataset. Example:

Why this matters

Evals are important to get the most out of AI systems but are incredibly time consuming and frustrating to create
It can be confusing on what shape the data needs to be in for different eval types

Key scenarios this enables

Detecting prompt regressions in agents
Bulk model and prompt experimentation

MVP

The Input and Output can be automatically mapped to eval variables {{query}} and {{response}}. This is an assumption to streamline the experience. The same may be true for mapping to tool_definitions and tool_calls.
Start with options that easily map to trace data like Relevance, Task Adherence, Coherence, Similarity, Intent Resolution, and Tool Call Accuracy.

Oct 22 '25 15:10 therealjohn

Thank you for contacting us! Any issue or feedback from you is quite important to us. We will do our best to fully respond to your issue as soon as possible. Sometimes additional investigations may be needed, we will usually get back to you within 2 days by adding comments to this issue. Please stay tuned.

Oct 22 '25 15:10 microsoft-github-policy-service[bot]

Prototype:

https://github.com/user-attachments/assets/0e3a55cc-ba1a-4881-99b3-1bc8ebc03d96

Oct 23 '25 20:10 therealjohn

Added to AI Toolkit project backlog and we will plan this post Ignite.

Oct 27 '25 06:10 MuyangAmigo