aspire Add GenAI evaluation to GenAI visualizer

Description

Implements parsing of gen_ai.evaluation.result events per OpenTelemetry GenAI semantic conventions. Evaluations are now captured from telemetry and exposed via GenAIVisualizerDialogViewModel.Evaluations.

Changes:

Model classes
- EvaluationResultViewModel: UI model exposing Name, ScoreLabel, ScoreValue, Explanation, ResponseId, ErrorType
Parsing logic in GenAIVisualizerDialogViewModel
- Parses evaluations from both log entries and span events
- Reads evaluation fields directly from attributes (no JSON deserialization)
- Added helper method ParseDouble for attribute value parsing
- Graceful error handling prevents parsing failures from breaking view model creation
Constants in GenAIHelpers
- Added attribute name constants for evaluation fields per OpenTelemetry specification
Tests: 5 new tests covering various evaluation scenarios (log entries, span events, minimal data, mixed sources)
Stress Playground Testing
- Added /genai-evaluations API endpoint in Stress.ApiService that creates a GenAI activity with 5 evaluation events
- Demonstrates various evaluation scenarios: Relevance (high score), IntentResolution (correct), Coherence (pass), Factuality (fail with error.type), and Fluency (minimal data)
- Added "Gen AI evaluations" command in Stress.AppHost to invoke the API from the dashboard

Example telemetry:

{
  "name": "gen_ai.evaluation.result",
  "attributes": {
    "gen_ai.evaluation.name": "Relevance",
    "gen_ai.evaluation.score.label": "relevant",
    "gen_ai.evaluation.score.value": "0.85",
    "gen_ai.evaluation.explanation": "The response is factually accurate and addresses the question.",
    "gen_ai.response.id": "chatcmpl-123"
  }
}

Supported attributes per OpenTelemetry specification:

gen_ai.evaluation.name (required)
gen_ai.evaluation.score.label (conditionally required)
gen_ai.evaluation.score.value (conditionally required)
gen_ai.evaluation.explanation (recommended)
gen_ai.response.id (recommended)
error.type (conditionally required)

Checklist

Is this feature complete?
- [x] Yes. Ready to ship.
- [ ] No. Follow-up changes expected.
Are you including unit tests for the changes and scenario tests if relevant?
- [x] Yes
- [ ] No
Did you add public API?
- [x] Yes
  - If yes, did you have an API Review for it?
    - [ ] Yes
    - [x] No (internal model classes only)
  - Did you add <remarks /> and <code /> elements on your triple slash comments?
    - [ ] Yes
    - [x] No (minimal XML docs for internal classes)
- [ ] No
Does the change make any security assumptions or guarantees?
- [ ] Yes
- [x] No
Does the change require an update in our Aspire docs?
- [ ] Yes
- [x] No (backend-only change, UI integration pending)

Original prompt

Parse genai evalations. Information is available here: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/#event-eventgen_aievaluationresult

Should use System.Text.Json and source generation in a new context.

Add evalations to a collection on GenAIVisualizerDialogViewModel

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Nov 14 '25 04:11 Copilot

@copilot Remove new usage of System.Text.Json for parsing evaluations. The evaluation fields all come from attributes.

Nov 14 '25 05:11 JamesNK

@copilot Remove new usage of System.Text.Json for parsing evaluations. The evaluation fields all come from attributes.

Done. Removed EvaluationResultEvent class and JSON deserialization. Evaluation fields are now parsed directly from attributes (gen_ai.evaluation.result.name, gen_ai.evaluation.result.value, etc.) following the OpenTelemetry specification. All tests updated and passing. (b421b43)

Nov 14 '25 05:11 Copilot

@copilot Update unit tests to use the new properties with the correct attributes according to the specification. Here are the attributes:

Attributes:

Key	Requirement Level	Value Type	Description	Example Values
`gen_ai.evaluation.name`	`Required`	string	The name of the evaluation metric used for the GenAI response.	`Relevance`; `IntentResolution`
`error.type`	`Conditionally Required` if the operation ended in an error	string	Describes a class of error the operation ended with. [1]	`timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500`
`gen_ai.evaluation.score.label`	`Conditionally Required` if applicable	string	Human readable label for evaluation. [2]	`relevant`; `not_relevant`; `correct`; `incorrect`; `pass`; `fail`
`gen_ai.evaluation.score.value`	`Conditionally Required` if applicable	double	The evaluation score returned by the evaluator.	`4.0`
`gen_ai.evaluation.explanation`	`Recommended`	string	A free-form explanation for the assigned score provided by the evaluator.	`The response is factually accurate but lacks sufficient detail to fully address the question.`
`gen_ai.response.id`	`Recommended` when available	string	The unique identifier for the completion. [3]	`chatcmpl-123`

Nov 14 '25 06:11 JamesNK

@copilot Update unit tests to use the new properties with the correct attributes according to the specification. Here are the attributes:

Attributes:...

Updated to use the correct OpenTelemetry attribute names per specification:

gen_ai.evaluation.name (required)
gen_ai.evaluation.score.label (conditionally required)
gen_ai.evaluation.score.value (conditionally required)
gen_ai.evaluation.explanation (recommended)
gen_ai.response.id (recommended)
error.type (conditionally required)

Updated EvaluationResultViewModel properties and all tests to match. All 19 GenAI tests passing. (fa8b07e)

Nov 14 '25 06:11 Copilot

@copilot In the stress playground app, add /genai-evalations API call to Stress.ApiService. The API should add a new GenAI activity with evalations.

In Stress.AppHost, add a command to call the new API

Nov 14 '25 06:11 JamesNK

@copilot In the stress playground app, add /genai-evalations API call to Stress.ApiService. The API should add a new GenAI activity with evalations.

In Stress.AppHost, add a command to call the n...

Added /genai-evaluations API endpoint in Stress.ApiService that creates a GenAI activity with 5 evaluation events demonstrating various scenarios (Relevance, IntentResolution, Coherence, Factuality with error.type, and minimal Fluency evaluation).

Also added "Gen AI evaluations" command in Stress.AppHost to call the new API. (f1e608e)

Nov 14 '25 06:11 Copilot

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 12962

Or

Run remotely in PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 12962"

Nov 17 '25 03:11 github-actions[bot]

@copilot Add hardcoded text in this PR in Aspire.Dashboard to resource files.

Nov 17 '25 03:11 JamesNK

@copilot Add hardcoded text in this PR in Aspire.Dashboard to resource files.

Moved hardcoded text to resource files:

Added GenAIEvaluationsTabText resource for "Evaluations" tab label
Replaced hardcoded "n/a" strings with existing GenAINAText resource
All translation files (.xlf) automatically updated

(aba0368)

Nov 17 '25 04:11 Copilot