genkit eval changes

Open nullfunc opened this issue 2 months ago • 0 comments

Description

Using genkit evaluation. We add a testing framework to check whether changes to LLMs or tool name/descriptions will yield the same tool calls when given the same input. The current_evaluation.json file contains the current test run and pass rates. The Makefile (make genkit-help) has been updated with information on how the workflow is expected to run as well as a README.md. Any changes to LLM or tools/descriptions should have the same or better pass rate.

Linked Issues

fixes #1486

Checklist

[x] I have performed a self-review of my code
[ ] I have added appropriate tests
[ ] I have updated the Defang CLI docs and/or README to reflect my changes, if necessary

Nov 05 '25 22:11 nullfunc