aspire icon indicating copy to clipboard operation
aspire copied to clipboard

[tests] Flaky failures with`Application orchestrator dependency check returned an error: The operation has timed out`

Open radical opened this issue 1 year ago • 4 comments

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=765658 Build error leg or test failing: Aspire.Hosting.Testing.Tests.TestingBuilderTests.SetsCorrectContentRoot(genericEntryPoint: False) Pull request: https://github.com/dotnet/aspire/pull/5129

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "Application orchestrator dependency check returned an error: The operation has timed out",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=765658 Error message validated: [Application orchestrator dependency check returned an error: The operation has timed out] Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 8/6/2024 4:24:19 AM UTC

Report

Build Definition Test Pull Request
770196 dotnet/aspire Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: False) dotnet/aspire#4708
769123 dotnet/aspire Aspire.Hosting.Redis.Tests.RedisFunctionalTests.VerifyRedisResource dotnet/aspire#5218
768762 dotnet/aspire Aspire.Hosting.Testing.Tests.TestingBuilderTests.SetsCorrectContentRoot(genericEntryPoint: False) dotnet/aspire#5033
768051 dotnet/aspire Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: True) dotnet/aspire#5150
767861 dotnet/aspire Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: False)
767902 dotnet/aspire Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: False) dotnet/aspire#5193
767726 dotnet/aspire Aspire.Hosting.Redis.Tests.RedisFunctionalTests.PersistenceIsDisabledByDefault dotnet/aspire#5201
767531 dotnet/aspire Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: False) dotnet/aspire#5161
767562 dotnet/aspire Aspire.Hosting.Testing.Tests.TestingBuilderTests.SetsCorrectContentRoot(genericEntryPoint: False) dotnet/aspire#5168
767576 dotnet/aspire Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: False) dotnet/aspire#5056
767425 dotnet/aspire Aspire.Hosting.Redis.Tests.RedisFunctionalTests.PersistenceIsDisabledByDefault
767092 dotnet/aspire Aspire.Hosting.Testing.Tests.TestingBuilderTests.GetHttpClientBeforeStart(genericEntryPoint: False)
766889 dotnet/aspire Aspire.Hosting.Redis.Tests.RedisFunctionalTests.PersistenceIsDisabledByDefault dotnet/aspire#5194
766865 dotnet/aspire Aspire.Hosting.Qdrant.Tests.QdrantFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: False) dotnet/aspire#5193
766235 dotnet/aspire Aspire.Hosting.Redis.Tests.RedisFunctionalTests.VerifyRedisResource dotnet/aspire#5181
765658 dotnet/aspire Aspire.Hosting.Testing.Tests.TestingBuilderTests.GetHttpClientBeforeStart(genericEntryPoint: False) dotnet/aspire#5129

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 15 16

radical avatar Aug 06 '24 04:08 radical

There doesn't seem to be any output from the dcp-info command. Multiple tests failed in the linked test run.

https://github.com/dotnet/aspire/blob/4938cb30aa8f7a9f535cc8a5d3735a79a4f8e0d3/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs#L34-L36

Could this be due to the process exiting quickly causing us to miss some output? I think it would be useful to add some more info to the exception message like the _dcpOptions.DependencyCheckTimeout. Also, cancellationToken is passed to the method so it would be useful to differentiate a timeout due to the token being cancelled, or the timeout being hit.

radical avatar Aug 06 '24 04:08 radical

cc @eerhardt @sebastienros @karolz-ms

related - https://github.com/dotnet/aspire/issues/4640

radical avatar Aug 06 '24 04:08 radical

Is the dependency check timeout set to default value (which is 25 seconds) during test runs?

If so, as a first step I would bump it up to 180 i.e. 3 minutes. This can be done via startup parameters, or, probably more conveniently, via an environment variable https://github.com/dotnet/aspire/blob/main/src/Aspire.Hosting/Dcp/DcpOptions.cs#L155 DOTNET_ASPIRE_DEPENDENCY_CHECK_TIMEOUT. The reason is there is probably a lot going on on the machine when the tests are starting; I have seen very long delays that had to do with AV scanning etc and nothing with the test itself.

If that does not help, we can tweak the dependency check logic further.

karolz-ms avatar Aug 06 '24 16:08 karolz-ms

The timeout bump seems to have helped this. Closing it for now.

radical avatar Aug 20 '24 01:08 radical

I just saw this on the 8.2 branch:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=791699&view=logs&jobId=acc6f692-388e-5b31-5997-e154da29f5b3&j=acc6f692-388e-5b31-5997-e154da29f5b3&t=59fafa2a-4be0-5f74-c845-0779cc54f4a3

https://helixre107v0xd1eu3ibi6ka.blob.core.windows.net/dotnet-aspire-refs-pull-5477-merge-c9cb6937fcd94fb483/Aspire.Hosting.Elasticsearch.Tests/1/console.80efd9c6.log?helixlogtype=result

info: Aspire.Hosting.DistributedApplication[0]
      Aspire version: 8.2.0
info: Aspire.Hosting.DistributedApplication[0]
      Distributed application starting.
info: Aspire.Hosting.DistributedApplication[0]
      Application host directory is: /mnt/vss/_work/1/s/tests/Aspire.Hosting.Tests
fail: Microsoft.Extensions.Hosting.Internal.Host[11]
      Hosting failed to start
      Aspire.Hosting.DistributedApplicationException: Application orchestrator dependency check returned an error: The operation has timed out. 
      
         at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 116
         at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 130
         at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 67
         at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
         at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
[xUnit.net 00:08:07.13]     Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.VerifyElasticsearchResource [FAIL]
[xUnit.net 00:08:07.16]       Aspire.Hosting.DistributedApplicationException : Application orchestrator dependency check returned an error: The operation has timed out. 
[xUnit.net 00:08:07.16]       
[xUnit.net 00:08:07.16]       Stack Trace:
[xUnit.net 00:08:07.18]         /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs(116,0): at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken)
[xUnit.net 00:08:07.18]         /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs(130,0): at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken)
[xUnit.net 00:08:07.18]         /_/src/Aspire.Hosting/Dcp/DcpHostService.cs(67,0): at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken)
[xUnit.net 00:08:07.18]            at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
[xUnit.net 00:08:07.18]            at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
[xUnit.net 00:08:07.18]            at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
[xUnit.net 00:08:07.18]         /_/src/Aspire.Hosting/DistributedApplication.cs(274,0): at Aspire.Hosting.DistributedApplication.StartAsync(CancellationToken cancellationToken)
[xUnit.net 00:08:07.18]         /_/tests/Aspire.Hosting.Elasticsearch.Tests/ElasticsearchFunctionalTests.cs(40,0): at Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.VerifyElasticsearchResource()
[xUnit.net 00:08:07.18]         --- End of stack trace from previous location ---
[xUnit.net 00:08:07.18]       Output:
[xUnit.net 00:08:07.18]         | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Aspire version: 8.2.0
[xUnit.net 00:08:07.18]         | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Distributed application starting.
[xUnit.net 00:08:07.18]         | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Application host directory is: /mnt/vss/_work/1/s/tests/Aspire.Hosting.Tests
[xUnit.net 00:08:07.18]         | [2024-08-28T22:12:56] Microsoft.Extensions.Hosting.Internal.Host Error: Hosting failed to start
[xUnit.net 00:08:07.18]         | Aspire.Hosting.DistributedApplicationException: Application orchestrator dependency check returned an error: The operation has timed out. 
[xUnit.net 00:08:07.18]         |    at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 116
[xUnit.net 00:08:07.18]         |    at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 130
[xUnit.net 00:08:07.18]         |    at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 67
[xUnit.net 00:08:07.18]         |    at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
[xUnit.net 00:08:07.18]         |    at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
[xUnit.net 00:08:07.29]   Finished:    Aspire.Hosting.Elasticsearch.Tests
Data collector 'Blame' message: All tests finished running, Sequence file will not be generated.
  Failed Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.VerifyElasticsearchResource [30 s]
  Error Message:
   Aspire.Hosting.DistributedApplicationException : Application orchestrator dependency check returned an error: The operation has timed out. 

  Stack Trace:
     at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 116
   at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 130
   at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 67
   at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
   at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Aspire.Hosting.DistributedApplication.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/DistributedApplication.cs:line 274
   at Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.VerifyElasticsearchResource() in /_/tests/Aspire.Hosting.Elasticsearch.Tests/ElasticsearchFunctionalTests.cs:line 40
--- End of stack trace from previous location ---
  Standard Output Messages:
 | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Aspire version: 8.2.0
 | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Distributed application starting.
 | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Application host directory is: /mnt/vss/_work/1/s/tests/Aspire.Hosting.Tests
 | [2024-08-28T22:12:56] Microsoft.Extensions.Hosting.Internal.Host Error: Hosting failed to start
 | Aspire.Hosting.DistributedApplicationException: Application orchestrator dependency check returned an error: The operation has timed out. 
 |    at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 116
 |    at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 130
 |    at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 67
 |    at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
 |    at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)


Results File: /datadisks/disk1/work/B77F0977/w/A3C608F8/uploads/logs/TestResults.trx

Test Run Failed.
Total tests: 15
     Passed: 14
     Failed: 1
 Total time: 8.1511 Minutes

eerhardt avatar Aug 29 '24 13:08 eerhardt

Test on main failed with this error.

Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: False)

Aspire.Hosting.DistributedApplicationException : Application orchestrator dependency check returned an error: The operation has timed out.
   at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 111
   at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 125
   at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 71
   at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
   at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Aspire.Hosting.DistributedApplication.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/DistributedApplication.cs:line 274
   at Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.WithDataShouldPersistStateBetweenUsages(Boolean useVolume) in /_/tests/Aspire.Hosting.Elasticsearch.Tests/ElasticsearchFunctionalTests.cs:line 149
--- End of stack trace from previous location ---

https://dev.azure.com/dnceng-public/public/_build/results?buildId=837103&view=results

Aspire.Hosting.Elasticsearch.Tests.zip

JamesNK avatar Oct 10 '24 02:10 JamesNK

Can we close this now? It hasn't been hit recently.

radical avatar Feb 28 '25 22:02 radical

Can we close this now? It hasn't been hit recently.

Closing. Please re-open if it is hit again.

radical avatar Apr 02 '25 22:04 radical