fix: display eval status per metric type
When viewing eval results, if response_match_score failed but tool_trajectory_avg_score passed, all messages in the invocation (including tool calls) incorrectly showed ❌. This can be confusing because the tool trajectory is actually correct.
To address this issue, this PR introduces an isToolRelatedEvent() helper to identify events involving tool calls. The addEvalCaseResultToEvents() method now assigns the metric based on event type:
- Tool events →
tool_trajectory_avg_score - Text responses →
response_match_score
This solution hardcodes the mapping above. It works for the two current default metrics but it does not automatically support custom or future metrics.
Fixes #187 with minimal frontend-only changes, but long-term I would recommend a backend API change for a more scalable solution, such as including metadata on metrics indicating which event types they evaluate (for example something along the lines of: appliesTo: 'tool' | 'response')