genkit
genkit copied to clipboard
[Evals] Summary metrics for evaluators
Is your feature request related to a problem? Please describe. Current eval run results do not contain aggregated metrics
Describe the solution you'd like Design and implement summary metrics for both CLI and Dev UI. Consider different types of scores (numeric, enum, etc). Should handle errors gracefully.
Additional context https://github.com/firebase/genkit/issues/1631#issuecomment-2605792834