vscode-ai-toolkit icon indicating copy to clipboard operation
vscode-ai-toolkit copied to clipboard

Use eval results with an llm to automatically improve an agent's system prompt

Open therealjohn opened this issue 2 months ago • 2 comments

Help devs use eval results to improve their agents used to create the evals automatically by providing the eval results as context to a new feature which uses an LLM to iteratively update a system prompt and re-eval.

Image

Why this matters

  • It's not clear what to do with evals right now
  • Evals should help make improvements as part of the dev journey

Key scenarios this enables

  • Detecting prompt regressions in agents
  • Bulk model and prompt experimentation

MVP

  • New tool available under the Agent and Workflow tools section, Agent Optimizer
  • Select an existing eval result
  • Select an agent, or provide the system prompt
  • Provide a system prompt for the LLM judge or use the default one provided
  • Specify the output schema for the LLM judge
  • Specify the max iterations and target, runs until either one is reached
  • Agent can be modified / saved directly, or system prompt updates can be copied and pasted to wherever dev has them.

therealjohn avatar Oct 22 '25 16:10 therealjohn

Added to AI Toolkit project backlog and we will plan this post Ignite.

MuyangAmigo avatar Oct 27 '25 06:10 MuyangAmigo

Is there an appetite to leverage Data Wrangler for the functionality? Or are you more interested in this being proprietary to AI Toolkit?

AngelosP avatar Oct 31 '25 00:10 AngelosP