nuxt.com icon indicating copy to clipboard operation
nuxt.com copied to clipboard

improve: MCP tool descriptions and evaluations

Open onmax opened this issue 2 weeks ago • 9 comments

Improved MCP tool descriptions with structured guidance and added realistic evaluation scenarios.

Impact

Metric Before After
Eval Score 45% 60%

Model: gpt-5.1-codex-mini: maybe we should try other models like sonnet 4.5 which is more common for developers?

[!IMPORTANT] Dev server must be running in the same machine at the moment

Changes

  • Added WHEN TO USE / WHEN NOT TO USE sections to tool descriptions
  • Included concrete examples and common paths for documentation tools
  • Clarified parameter usage (slug vs name) with examples
  • Fixed prompt argsSchema to use z.object() for compatibility
  • Added realistic evaluation scenarios based on actual developer questions

Limitations

Several evaluation scenarios are commented out due to MCP prompt limitations:

  • @ai-sdk/mcp does not support converting prompts to tools yet
  • Tests requiring find_documentation_for_topic, deployment_guide, and migration_help prompts are disabled
  • These will be enabled once prompt-to-tool conversion is available

Related

  • https://ai-sdk.dev/docs/reference/ai-sdk-core/create-mcp-client
  • https://x.com/HugoRCD__/status/1990441837782499719
  • https://github.com/mattpocock/evalite/pull/339
  • https://github.com/onmax/mcp-starter

onmax avatar Nov 18 '25 14:11 onmax

@onmax is attempting to deploy a commit to the Nuxt Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] avatar Nov 18 '25 14:11 vercel[bot]

I would like to also suggest updating the blog post with this evals solution. What do you think?

https://nuxt.com/blog/building-nuxt-mcp

onmax avatar Nov 18 '25 14:11 onmax

@onmax Thank you! Give me some time to review this and familiarize myself with Evalite and yes depending on that we'll probably midifira the blog post. FYI I'm also working on another project related to MCP where this could be useful! 😁

HugoRCD avatar Nov 18 '25 14:11 HugoRCD

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
nuxt Ready Ready Preview Comment Nov 26, 2025 4:57pm

vercel[bot] avatar Nov 18 '25 14:11 vercel[bot]

ok @HugoRCD .

I was also thinking to open a PR for Nuxt UI, but I will wait until you decide what's the best approach :)

onmax avatar Nov 18 '25 14:11 onmax

@onmax Do you have a particular config to run the evals because I tried with the current version and there's this error: CleanShot 2025-11-20 at 13 44 14@2x

And with the latest the same: CleanShot 2025-11-20 at 13 43 03@2x

HugoRCD avatar Nov 20 '25 13:11 HugoRCD

are you running the dev server in parallel?

Sorry i didn't mention it in the PR!

image

onmax avatar Nov 20 '25 13:11 onmax

are you running the dev server in parallel?

Sorry i didn't mention it in the PR!

image

It's pretty obvious in the end but I had too many apps running at the same time and it wasn't using the right one 😭 (in my defence the error could be a bit more obvious 😂)

HugoRCD avatar Nov 20 '25 14:11 HugoRCD

yes. totally agreed. but i didn't want to write too much custom code for now :)

onmax avatar Nov 20 '25 14:11 onmax

Happy to resolve the conflicts @onmax ?

atinux avatar Nov 25 '25 14:11 atinux

Yes 👍

Should i leave the @ts-expect-error - MCP SDK has overly strict Zod type constraints or should I try to solve them?

The CI is currently not happy if I remove them

onmax avatar Nov 25 '25 14:11 onmax

I resolved the conflict btw @atinux

onmax avatar Nov 26 '25 16:11 onmax

@onmax Not a fan of having to put these any and expect-error everywhere but I guess we don't really have a choice until zod 4 support is added 🥲

HugoRCD avatar Nov 26 '25 16:11 HugoRCD

🫂 i feel you...

onmax avatar Nov 26 '25 17:11 onmax