Quarantine RunAsync_WaitsForCompletion_Single_RegistrationParameters
Failing Test(s)
- Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckPublisherHostedServiceTest.RunAsync_WaitsForCompletion_Single_RegistrationParameters
Error Message
Assert.Contains() Failure
Not found: CheckDelay1Period9
In value: String[] ["CheckDefault", "CheckDelay1Period18", "CheckDelay2Period18"]
Stacktrace
at Microsoft.Extensions.Diagnostics.HealthChecks.HealthCheckPublisherHostedServiceTest.RunAsync_WaitsForCompletion_Single_RegistrationParameters() in /_/src/HealthChecks/HealthChecks/test/HealthCheckPublisherHostedServiceTest.cs:line 388
--- End of stack trace from previous location ---
Logs
Build
- https://dev.azure.com/dnceng-public/public/_build/results?buildId=356938&view=ms.vss-test-web.build-test-results-tab&runId=7511354&resultId=122241&paneView=debug
Hi, I have spent some time investigating this and in my opinion the issue is a race condition with the TestPublisher here. My observation is the flaky test is writing mulitple reports from multiple timer at around the same time. In my opinion the fix would be easy, using a thread safe collection. Do you think this worth pursuing? ( I can implement this modification. )
In my opinion the same applies to https://github.com/dotnet/aspnetcore/issues/56245
@feherzsolt That seems like a reasonable thing to try. We can see if using a concurrent collection improves things here.
Hi i have tried to find a way to validate these modifications. In my opinion the best way would be to check test failure rate for these test. I was looking to find this data, got lost a bit. Could you help what would be the best way to validate please?
@feherzsolt We typically apply a test-fixed label to the issue then monitor the test for 30 days before unquarantining to verify that the fix worked.
Thank you, let's do that. I have created the PR https://github.com/dotnet/aspnetcore/pull/63299
Thank you!
That test was not failing for at least 30 days, why did we need https://github.com/dotnet/aspnetcore/pull/63299?
Let's keep it open for another 30 days after the new change.
That test was not failing for at least 30 days, why did we need #63299?
Yes it was. https://dev.azure.com/dnceng-public/public/_test/analytics?definitionId=86&contextType=build
Could also repro locally if running the test with [Repeat(1000)].