fix: display eval status per metric type

Open stefanoamorelli opened this issue 1 month ago • 0 comments

When viewing eval results, if response_match_score failed but tool_trajectory_avg_score passed, all messages in the invocation (including tool calls) incorrectly showed ❌. This can be confusing because the tool trajectory is actually correct.

To address this issue, this PR introduces an isToolRelatedEvent() helper to identify events involving tool calls. The addEvalCaseResultToEvents() method now assigns the metric based on event type:

Tool events → tool_trajectory_avg_score
Text responses → response_match_score

This solution hardcodes the mapping above. It works for the two current default metrics but it does not automatically support custom or future metrics.

Fixes #187 with minimal frontend-only changes, but long-term I would recommend a backend API change for a more scalable solution, such as including metadata on metrics indicating which event types they evaluate (for example something along the lines of: appliesTo: 'tool' | 'response')

Nov 29 '25 18:11 stefanoamorelli