extensions icon indicating copy to clipboard operation
extensions copied to clipboard

[AI Evaluation] Microsoft.Extensions.AI.Evaluation evaluators don't handle responses that used tool calls well.

Open mikeholczer opened this issue 3 months ago • 1 comments

Description

As an example, the CoherenceEvaluator, makes use of the TryGetUserRequest() extensions method which doesn't "try" very hard as it only looks at the last message to see if it's Role is ChatRole.User, and if it's not it returns false and sets the userRequest object to null. When tools are available to the model and are used, the last message in the conversation will likely be a tool message with a Role of ChatRole.Tool.

I think the Evaluators should be looking back to the last message that has Role set to ChatRole.User, or at least as far back the previous message with ChatRole.Assistant and the evaluation prompt should be updated to take the tool messages into account as well, as the Coherence of the response may depend on what data they return as well.

Reproduction Steps

  1. Have a IChatClient instance
  2. Call .GetResponseAsync(chatMessages, callOptions, cancellationToken) with a callOptions object that has tools configured and a chatMessage with content that would cause the model to use at least one of the configured tools.
  3. Create a CoherenceEvaluator instance and call .EvaluateAsync passing in the messages from the response.

Expected behavior

The evaluator is able to judge the coherence of the response, and include the most recent user request and any tools message since in it's reasoning.

Actual behavior

The evaluation include a thought chain including something like:

First, I need to identify what the QUERY is - but I notice the QUERY field is completely empty. There is no question or prompt provided

Regression?

No response

Known Workarounds

No response

Configuration

.NET 9 Microsoft.Extensions.AI, Microsoft.Extensions.AI.Evaluation and Microsoft.Extensions.AI.Evaluation.Quality version 9.9.0

Other information

https://github.com/dotnet/extensions/blob/53ef1158f9f42632e111d6873a8cd72b803b4ae6/src/Libraries/Microsoft.Extensions.AI.Evaluation.Quality/CoherenceEvaluator.cs#L89-L91

https://github.com/dotnet/extensions/blob/53ef1158f9f42632e111d6873a8cd72b803b4ae6/src/Libraries/Microsoft.Extensions.AI.Evaluation/ChatMessageExtensions.cs#L17-L44

https://github.com/dotnet/extensions/blob/53ef1158f9f42632e111d6873a8cd72b803b4ae6/src/Libraries/Microsoft.Extensions.AI.Evaluation.Quality/CoherenceEvaluator.cs#L182-L184

mikeholczer avatar Sep 19 '25 13:09 mikeholczer