defang icon indicating copy to clipboard operation
defang copied to clipboard

genkit eval changes

Open nullfunc opened this issue 2 months ago • 0 comments

Description

Using genkit evaluation. We add a testing framework to check whether changes to LLMs or tool name/descriptions will yield the same tool calls when given the same input. The current_evaluation.json file contains the current test run and pass rates. The Makefile (make genkit-help) has been updated with information on how the workflow is expected to run as well as a README.md. Any changes to LLM or tools/descriptions should have the same or better pass rate.

Linked Issues

fixes #1486

Checklist

  • [x] I have performed a self-review of my code
  • [ ] I have added appropriate tests
  • [ ] I have updated the Defang CLI docs and/or README to reflect my changes, if necessary

nullfunc avatar Nov 05 '25 22:11 nullfunc