vscode-ai-toolkit
vscode-ai-toolkit copied to clipboard
Use eval results with an llm to automatically improve an agent's system prompt
Help devs use eval results to improve their agents used to create the evals automatically by providing the eval results as context to a new feature which uses an LLM to iteratively update a system prompt and re-eval.
Why this matters
- It's not clear what to do with evals right now
- Evals should help make improvements as part of the dev journey
Key scenarios this enables
- Detecting prompt regressions in agents
- Bulk model and prompt experimentation
MVP
- New tool available under the Agent and Workflow tools section, Agent Optimizer
- Select an existing eval result
- Select an agent, or provide the system prompt
- Provide a system prompt for the LLM judge or use the default one provided
- Specify the output schema for the LLM judge
- Specify the max iterations and target, runs until either one is reached
- Agent can be modified / saved directly, or system prompt updates can be copied and pasted to wherever dev has them.
Added to AI Toolkit project backlog and we will plan this post Ignite.
Is there an appetite to leverage Data Wrangler for the functionality? Or are you more interested in this being proprietary to AI Toolkit?