aspire
aspire copied to clipboard
WithHttpCommand_EnablesCommandOnceResourceIsRunning flaky
https://dev.azure.com/dnceng-public/public/_build/results?buildId=981881&view=ms.vss-test-web.build-test-results-tab&runId=26187914&resultId=100396&paneView=debug
Relevant log part below. 2 min wait failed here https://github.com/dotnet/aspire/blob/4eda59889313734cdeb4c393b715f08a251bb92a/tests/Aspire.Hosting.Tests/WithHttpCommandTests.cs#L370
Note the resource successfully transitioned to Running (per log, and wait on line above) but command remained disabled (per log below) so exactly 2 mins later it timed out.
@mitchdenny can you figure out the issue by code inspection how it could go to running but not enable the command in this case?
Failed Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning [2 m 2 s]
Error Message:
System.TimeoutException : The operation at /_/tests/Aspire.Hosting.Tests/WithHttpCommandTests.cs:370 timed out after reaching the limit of 120000ms.
Stack Trace:
at Microsoft.AspNetCore.InternalTesting.AsyncTestHelpers.TimeoutAfter(Task task, TimeSpan timeout, String filePath, Int32 lineNumber) in /_/tests/Shared/AsyncTestHelpers.cs:line 149
at Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning() in /_/tests/Aspire.Hosting.Tests/WithHttpCommandTests.cs:line 370
--- End of stack trace from previous location ---
Standard Output Messages:
| [2025-03-15T23:36:08] Aspire.Hosting.DistributedApplication Information: Aspire version: 9.2.0-ci
| [2025-03-15T23:36:08] Aspire.Hosting.DistributedApplication Information: Distributed application starting.
| [2025-03-15T23:36:08] Aspire.Hosting.DistributedApplication Information: Application host directory is: /mnt/vss/_work/1/s/tests/Aspire.Hosting.Tests
| [2025-03-15T23:36:08] Aspire.Hosting.Cli.CliBackchannel Debug: Aspire CLI backchannel socket path was not specified.
| [2025-03-15T23:36:08] Aspire.Hosting.Dcp.DcpHost Information: Starting DCP with arguments: start-apiserver --monitor 68446 --detach --kubeconfig "/tmp/aspire.8UTwEK/kubeconfig"
| [2025-03-15T23:36:08] Aspire.Hosting.ApplicationModel.ResourceNotificationService Trace: Version: 1
| Resource service/service update published:
| ResourceType = CustomResource,
| CreationTimeStamp = (null),
| State = { Text = (null), Style = (null) },
| HeathStatus = (null),
| ResourceReady = False,
| ExitCode = (null),
| Urls = { },
| EnvironmentVariables = {
| },
| Properties = {
| },
| HealthReports = {
| },
| Commands = {
| Do The Thing (mycommand) = Disabled
| }
| [2025-03-15T23:36:08] Aspire.Hosting.Dcp.DcpExecutor Debug: Watching over DCP Endpoint resources.
| [2025-03-15T23:36:08] Aspire.Hosting.Dcp.DcpExecutor Debug: Watching over DCP Service resources.
| [2025-03-15T23:36:08] Aspire.Hosting.Dcp.DcpExecutor Debug: Watching over DCP Container resources.
| [2025-03-15T23:36:08] Aspire.Hosting.Dcp.DcpExecutor Debug: Watching over DCP Executable resources.
| [2025-03-15T23:36:08] Aspire.Hosting.Dcp.dcp.start-apiserver.api-server Information: Starting API server...
| [2025-03-15T23:36:09] Aspire.Hosting.Dcp.KubernetesService Debug: Successfully read Kubernetes configuration from '/tmp/aspire.8UTwEK/kubeconfig' after 401 milliseconds.
| [2025-03-15T23:36:09] Aspire.Hosting.Dcp.dcp.start-apiserver.api-server Information: API server started {"Address": "127.0.0.1", "Port": 46341}
| [2025-03-15T23:36:09] Aspire.Hosting.Dcp.dcp.start-apiserver.dcp-host Information: Starting DCP controller host
| [2025-03-15T23:36:09] Aspire.Hosting.Dcp.dcp.start-apiserver.dcp-host Information: Started all services {"count": 1}
| [2025-03-15T23:36:09] Aspire.Hosting.Dcp.dcpctrl.ContainerOrchestrator Information: runtime status {"ContainerRuntime": "", "runtime": "podman", "status": {"Installed":false,"Running":false,"Error":"exec: \"podman\": executable file not found in $PATH\nfailed to start podman command 'Info'"}}
| [2025-03-15T23:36:09] Aspire.Hosting.Dcp.dcpctrl.ContainerOrchestrator Information: runtime status {"ContainerRuntime": "", "runtime": "docker", "status": {"Installed":true,"Running":true,"Error":""}}
| [2025-03-15T23:36:09] Aspire.Hosting.Dcp.dcpctrl.IdeExecutableRunner Information: Executables cannot be started via IDE: missing required environment variable 'DEBUG_SESSION_PORT'
| [2025-03-15T23:36:09] Aspire.Hosting.Dcp.dcpctrl Information: starting controller manager
| [2025-03-15T23:36:09] Aspire.Hosting.Devcontainers.Codespaces.CodespacesResourceUrlRewriterService Trace: Not running in Codespaces, skipping URL rewriting.
| [2025-03-15T23:36:09] Aspire.Hosting.DistributedApplication Information: Distributed application started. Press Ctrl+C to shut down.
| [2025-03-15T23:36:09] Aspire.Hosting.ApplicationModel.ResourceNotificationService Debug: Resource service/service changed state: Starting
| [2025-03-15T23:36:09] Aspire.Hosting.ApplicationModel.ResourceNotificationService Trace: Version: 2
| Resource service/service update published:
| ResourceType = CustomResource,
| CreationTimeStamp = (null),
| State = { Text = Starting, Style = (null) },
| HeathStatus = (null),
| ResourceReady = False,
| ExitCode = (null),
| Urls = { },
| EnvironmentVariables = {
| },
| Properties = {
| },
| HealthReports = {
| },
| Commands = {
| Do The Thing (mycommand) = Disabled
| }
| [2025-03-15T23:36:09] Aspire.Hosting.ApplicationModel.ResourceNotificationService Debug: Waiting for resource 'service' to enter one of the target state: Starting
| [2025-03-15T23:36:09] Aspire.Hosting.ApplicationModel.ResourceNotificationService Debug: Finished waiting for resource 'service'. Resource state is 'Starting'.
| [2025-03-15T23:36:09] Aspire.Hosting.ApplicationModel.ResourceNotificationService Debug: Resource service/service changed state: Starting -> Running
| [2025-03-15T23:36:09] Aspire.Hosting.Health.ResourceHealthCheckService Debug: Starting health monitoring for resource 'service'.
| [2025-03-15T23:36:09] Aspire.Hosting.ApplicationModel.ResourceNotificationService Trace: Version: 3
| Resource service/service update published:
| ResourceType = CustomResource,
| CreationTimeStamp = (null),
| State = { Text = Running, Style = (null) },
| HeathStatus = Healthy,
| ResourceReady = False,
| ExitCode = (null),
| Urls = { },
| EnvironmentVariables = {
| },
| Properties = {
| },
| HealthReports = {
| },
| Commands = {
| Do The Thing (mycommand) = Disabled
| }
| [2025-03-15T23:36:09] Aspire.Hosting.Health.ResourceHealthCheckService Debug: Resource 'service' has no health checks to monitor.
| [2025-03-15T23:36:09] Aspire.Hosting.ApplicationModel.ResourceNotificationService Debug: Waiting for resource 'service' to enter one of the target state: Running
| [2025-03-15T23:36:09] Aspire.Hosting.Health.ResourceHealthCheckService Debug: Resource 'service' is ready.
| [2025-03-15T23:36:09] Aspire.Hosting.Health.ResourceHealthCheckService Debug: Publishing ResourceReadyEvent for 'service'.
| [2025-03-15T23:36:09] Aspire.Hosting.ApplicationModel.ResourceNotificationService Debug: Finished waiting for resource 'service'. Resource state is 'Running'.
| [2025-03-15T23:36:09] Aspire.Hosting.Health.ResourceHealthCheckService Debug: Waiting for ResourceReadyEvent for 'service'.
| [2025-03-15T23:36:09] Aspire.Hosting.Health.ResourceHealthCheckService Debug: ResourceReadyEvent for 'service' completed.
| [2025-03-15T23:36:09] Aspire.Hosting.Health.ResourceHealthCheckService Debug: Publishing the result of ResourceReadyEvent for 'service'.
| [2025-03-15T23:36:09] Aspire.Hosting.ApplicationModel.ResourceNotificationService Trace: Version: 4
| Resource service/service update published:
| ResourceType = CustomResource,
| CreationTimeStamp = (null),
| State = { Text = Running, Style = (null) },
| HeathStatus = Healthy,
| ResourceReady = True,
| ExitCode = (null),
| Urls = { },
| EnvironmentVariables = {
| },
| Properties = {
| },
| HealthReports = {
| },
| Commands = {
| Do The Thing (mycommand) = Disabled
| }
| [2025-03-15T23:38:09] Aspire.Hosting.Dcp.DcpExecutor Debug: Cancellation received while watching Executable resources.
| [2025-03-15T23:38:09] Aspire.Hosting.Dcp.DcpExecutor Debug: Stopped watching Executable resources.
| [2025-03-15T23:38:09] Aspire.Hosting.Dcp.DcpExecutor Debug: Cancellation received while watching Container resources.
| [2025-03-15T23:38:09] Aspire.Hosting.Dcp.DcpExecutor Debug: Stopped watching Container resources.
| [2025-03-15T23:38:09] Aspire.Hosting.Dcp.DcpExecutor Debug: Cancellation received while watching Service resources.
| [2025-03-15T23:38:09] Aspire.Hosting.Dcp.DcpExecutor Debug: Stopped watching Service resources.
| [2025-03-15T23:38:09] Aspire.Hosting.Dcp.DcpExecutor Debug: Cancellation received while watching Endpoint resources.
| [2025-03-15T23:38:09] Aspire.Hosting.Dcp.DcpExecutor Debug: Stopped watching Endpoint resources.
| [2025-03-15T23:38:09] Aspire.Hosting.Dcp.dcp.start-apiserver.api-server.adminHttpHandler Information: API server changed status {"OldStatus": "Running", "NewStatus": "CleaningResources"}
| [2025-03-15T23:38:10] Aspire.Hosting.Dcp.dcp.start-apiserver.api-server.adminHttpHandler Information: API server changed status {"OldStatus": "CleanupComplete", "NewStatus": "Stopping"}
| [2025-03-15T23:38:10] Aspire.Hosting.Dcp.dcp.start-apiserver Information: Shutting down...
| [2025-03-15T23:38:10] Aspire.Hosting.Dcp.dcp.start-apiserver.dcp-host Information: Service DCP controller host terminated gracefully
| [2025-03-15T23:38:10] Aspire.Hosting.Dcp.dcp.start-apiserver Information: Shutdown complete.
Build Information
Build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=981881 Build error leg or test failing: Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning Pull request: N/A
Error Message
Fill the error message using step by step known issues guidance.
{
"ErrorMessage": " at Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning",
"ErrorPattern": "",
"BuildRetry": false,
"ExcludeConsoleLog": true
}
Known issue validation
Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=981881
Error message validated: [System.TimeoutException : The operation at]
Result validation: :white_check_mark: Known issue matched with the provided build.
Validation performed at: 3/20/2025 12:28:18 AM UTC
Report
| Build | Definition | Test | Pull Request |
|---|---|---|---|
| 996345 | dotnet/aspire | Aspire.Hosting.Tests.DistributedApplicationTests.StartResourceForcesStart | |
| 988876 | dotnet/aspire | Aspire.Hosting.Tests.DistributedApplicationTests.StartResourceForcesStart | |
| 987835 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning | |
| 986872 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning | |
| 981881 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning |
Summary
| 24-Hour Hit Count | 7-Day Hit Count | 1-Month Count |
|---|---|---|
| 1 | 2 | 5 |
Known issue validation
Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=981881
Error message validated: [ at Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning]
Result validation: :white_check_mark: Known issue matched with the provided build.
Validation performed at: 3/27/2025 10:12:12 PM UTC
Report
| Build | Definition | Test | Pull Request |
|---|---|---|---|
| 996345 | dotnet/aspire | Aspire.Hosting.Tests.DistributedApplicationTests.StartResourceForcesStart | |
| 988876 | dotnet/aspire | Aspire.Hosting.Tests.DistributedApplicationTests.StartResourceForcesStart | |
| 987835 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning | |
| 986872 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning | |
| 981881 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning |
Summary
| 24-Hour Hit Count | 7-Day Hit Count | 1-Month Count |
|---|---|---|
| 1 | 2 | 5 |
Report
| Build | Definition | Test | Pull Request |
|---|---|---|---|
| 996345 | dotnet/aspire | Aspire.Hosting.Tests.DistributedApplicationTests.StartResourceForcesStart | |
| 988876 | dotnet/aspire | Aspire.Hosting.Tests.DistributedApplicationTests.StartResourceForcesStart | |
| 987835 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning | |
| 986872 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning | |
| 981881 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning |
Summary
| 24-Hour Hit Count | 7-Day Hit Count | 1-Month Count |
|---|---|---|
| 0 | 0 | 5 |
Report
| Build | Definition | Test | Pull Request |
|---|---|---|---|
| 1030307 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning | |
| 1030294 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning | |
| 1030289 | dotnet/aspire | Aspire.Hosting.Tests.WithHttpCommandTests.WithHttpCommand_EnablesCommandOnceResourceIsRunning |
Summary
| 24-Hour Hit Count | 7-Day Hit Count | 1-Month Count |
|---|---|---|
| 3 | 3 | 3 |
Last 10 failures:
| Run date | Test Name |
|---|
-- Updated on 4/29/2025 6:21:23 PM UTC
Summary
| Last 40 | Total failed / total runs |
|---|---|
| 0 | 0 / 200 |
This is failing often on PRs too.
timeout in Aspire.Hosting.Tests.DistributedApplicationTests.StartResourceForcesStart didn't match as it was
System.TimeoutException : The operation at /_/tests/Aspire.Hosting.Tests/DistributedApplicationTests.cs:141 timed out after reaching the limit of 120000ms.
but associating it here
wait is on a resource stopping. https://github.com/dotnet/aspire/blob/21bda16cd60007621f1cb9b452251cde672d9b19/tests/Aspire.Hosting.Tests/DistributedApplicationTests.cs#L139-L141
| [2025-03-27T21:01:45] Aspire.Hosting.Health.ResourceHealthCheckService Trace: Health report status for 'force-resource-start-servicea' is Unhealthy.
I think it depends on serviceb which remained in state Unknown rather than going presumably Stopping->Stopped.
@karolz-ms can you suggest whether this might be orchestrator or more likely app host? Most of the logging here is orchestrator and I can't tell whether this is "no evidence of orchestrator problem, likely resource problem" or not.
https://dev.azure.com/dnceng-public/public/_build/results?buildId=996345&view=ms.vss-test-web.build-test-results-tab&runId=26589558&resultId=100476&paneView=attachments
Maybe we need more logging someplace to solve this kind of issue
We might need to switch to a "make change, then verify" approach for resource stopping, similar to what I implemented for resource deletion and the other flaky test(s) via https://github.com/dotnet/aspire/pull/8238. @danmoseley LMK if you guys need my help with that.
In the last 53 runs 53 failed. Listing last 10 failures:
| Run date | Outcome |
|---|---|
| Apr 8 4 AM | Failed |
| Apr 8 2 AM | Failed |
| Apr 8 2 AM | Failed |
| Apr 8 12 AM | Failed |
| Apr 8 12 AM | Failed |
| Apr 7 10 PM | Failed |
| Apr 7 10 PM | Failed |
| Apr 7 8 PM | Failed |
| Apr 7 8 PM | Failed |
| Apr 7 6 PM | Failed |
New failing runs - https://dev.azure.com/dnceng-public/public/_build/results?buildId=1030149&view=ms.vss-test-web.build-test-results-tab&runId=27655742&paneView=attachments&resultId=100504
This does not fail on github actions at all, but fails on Azdo pipeline runs. The test fails in different locations on Linux, and Windows.
Can anyone take a look at this? It has failed consistently for the last 10 runs on Linux, and Windows. Did something change?
This is failing pretty consistently on Linux, and Windows. Is this a regression? @mitchdenny @davidfowl
This test looks pretty racy, @DamianEdwards feels like we should rewrite or delete this test or add new apis to make it possible to write something more reliable.
Hmmm yeah, maybe it's as simple as changing the test to wait for the first resource notification after the state changes to running, but even typing that out now it still sounds like it'd be racy. Maybe we should be calculating the notification payload and including it in the event context before it gets published so stuff like this is more deterministic.
Why not just set a TCS in the update state callback and then put a wait timeout on that.
@mitchdenny
Why not just set a TCS in the update state callback and then put a wait timeout on that.
From the comments above the resource is successfully transitioning to the "Running" state, it's the waiting on command state to change to enabled that's timing out. The test simply watches all resource notifications until the state of the command of interest is enabled, after the resource enters the running state.
Which "update state callback" were you referring to exactly?