vscode-ai-toolkit Use eval results with an llm to automatically improve an agent's system prompt

Help devs use eval results to improve their agents used to create the evals automatically by providing the eval results as context to a new feature which uses an LLM to iteratively update a system prompt and re-eval.

Why this matters

It's not clear what to do with evals right now
Evals should help make improvements as part of the dev journey

Key scenarios this enables

Detecting prompt regressions in agents
Bulk model and prompt experimentation

MVP

New tool available under the Agent and Workflow tools section, Agent Optimizer
Select an existing eval result
Select an agent, or provide the system prompt
Provide a system prompt for the LLM judge or use the default one provided
Specify the output schema for the LLM judge
Specify the max iterations and target, runs until either one is reached
Agent can be modified / saved directly, or system prompt updates can be copied and pasted to wherever dev has them.

Oct 22 '25 16:10 therealjohn

Added to AI Toolkit project backlog and we will plan this post Ignite.

Oct 27 '25 06:10 MuyangAmigo

Is there an appetite to leverage Data Wrangler for the functionality? Or are you more interested in this being proprietary to AI Toolkit?

Oct 31 '25 00:10 AngelosP