spotlight icon indicating copy to clipboard operation
spotlight copied to clipboard

Evals for Spotlight

Open BYK opened this issue 3 months ago • 1 comments

We want Spotlight to be an indispensable tool for AI-assisted development. For this we need real-world scenarios that we can test Spotlight's usefulness (and also improve/fine-tune). These evals will not only be testing our MCPs but also testing our CLI as that is another way Spotlight can be used by the agents.

Here's the framework we have in mind:

  1. Have some real-world development tasks such as implementing a new feature using Calude Code (or any other tool such as cursor-agent etc).
  2. Invoke the AI assistant with the pre-defined prompt and expect it to use Spotlight CLI or MCP (this needs to be tested)
  3. When the assistant is finished, check its work and pass the test if the feature is implemented correctly.
  4. Crucially: do not include or reveal this final check as it may guide the AI assistent which we don't want under the scenario (unless the scenario itself is a version of TDD)

We should be able to run these locally and on CI continuously or on some schedule.

Available tools:

BYK avatar Oct 06 '25 14:10 BYK

This is on hold for now

BYK avatar Oct 22 '25 19:10 BYK