opentelemetry-dotnet-instrumentation
opentelemetry-dotnet-instrumentation copied to clipboard
Prometheus flaky test
Creating this issue to track failure from test IntegrationTests.SmokeTests.PrometheusExporter. Yesterday there was a different exception.
Error: [xUnit.net 00:00:34.85] IntegrationTests.SmokeTests.PrometheusExporter [FAIL]
16:22:12 [ERR] [xUnit.net 00:00:34.85] IntegrationTests.SmokeTests.PrometheusExporter [FAIL]
16:22:12 [DBG] Failed IntegrationTests.SmokeTests.PrometheusExporter [31 s]
16:22:12 [DBG] Error Message:
16:22:12 [DBG] Did not expect any exceptions after 30s, but found Xunit.Sdk.XunitException with message "Expected string "# HELP process_runtime_dotnet_gc_collections_count Number of garbage collections that have occurred since process start.
16:22:12 [DBG] # TYPE process_runtime_dotnet_gc_collections_count counter
16:22:12 [DBG] process_runtime_dotnet_gc_collections_count{generation="gen2"} 0 1661271707095
16:22:12 [DBG] process_runtime_dotnet_gc_collections_count{generation="gen1"} 0 1661271707095
16:22:12 [DBG] process_runtime_dotnet_gc_collections_count{generation="gen0"} 0 1661271707095
16:22:12 [DBG]
16:22:12 [DBG] # HELP process_runtime_dotnet_assemblies_count The number of .NET assemblies that are currently loaded.
16:22:12 [DBG] # TYPE process_runtime_dotnet_assemblies_count gauge
16:22:12 [DBG] process_runtime_dotnet_assemblies_count 32 1661271707095
16:22:12 [DBG] " to contain "TYPE MyFruitCounter counter"."
16:22:12 [DBG] at FluentAssertions.Execution.XUnit2TestFramework.Throw(String message)
16:22:12 [DBG] at FluentAssertions.Execution.AssertionScope.FailWith(Func1 failReasonFunc) 16:22:12 [DBG] at FluentAssertions.Primitives.StringAssertions1.Contain(String expected, String because, Object[] becauseArgs)
16:22:12 [DBG] at IntegrationTests.SmokeTests.<>c__DisplayClass9_0.<<PrometheusExporter>b__0>d.MoveNext() in D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\test\IntegrationTests\SmokeTests.cs:line 177
16:22:12 [DBG] --- End of stack trace from previous location where exception was thrown ---
16:22:12 [DBG] at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
16:22:12 [DBG] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
16:22:12 [DBG] at FluentAssertions.Specialized.AsyncFunctionAssertions2.<InvokeWithInterceptionAsync>d__10.MoveNext() 16:22:12 [DBG] . 16:22:12 [DBG] Stack Trace: 16:22:12 [DBG] at FluentAssertions.Execution.XUnit2TestFramework.Throw(String message) 16:22:12 [DBG] at FluentAssertions.Execution.AssertionScope.FailWith(Func1 failReasonFunc)
16:22:12 [DBG] at FluentAssertions.Specialized.AsyncFunctionAssertions`2.<>c__DisplayClass9_0.<<NotThrowAfterAsync>g__AssertionTaskAsync|0>d.MoveNext()
16:22:12 [DBG] --- End of stack trace from previous location where exception was thrown ---
16:22:12 [DBG] at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
16:22:12 [DBG] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
16:22:12 [DBG] at IntegrationTests.SmokeTests.<PrometheusExporter>d__9.MoveNext() in D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\test\IntegrationTests\SmokeTests.cs:line 179
16:22:12 [DBG] --- End of stack trace from previous location where exception was thrown ---
16:22:12 [DBG] at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
16:22:12 [DBG] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
16:22:12 [DBG] --- End of stack trace from previous location where exception was thrown ---
16:22:12 [DBG] at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
16:22:12 [DBG] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
16:22:12 [DBG] --- End of stack trace from previous location where exception was thrown ---
16:22:12 [DBG] at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
16:22:12 [DBG] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
16:22:12 [DBG] Standard Output Messages:
16:22:12 [DBG] Platform: X64
16:22:12 [DBG] Configuration: Release
16:22:12 [DBG] TargetFramework: net462
16:22:12 [DBG] .NET Core: False
It seems that we didn't have the expected counter even after 30 seconds. This seems suspicious.
A few more failures of this test.
https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/pull/1234#issuecomment-1248421645
CI pass failed with a different error on the same test https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3069474583/jobs/4958175590#step:4:3378
I hope that it will be fixed by #1239. Please let me know if Prometheus tests will be failing again.
I triggered verify-test: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3082544603
Passed. Congratulations @Kielek 🎉 Closing
Unfortunately I just got another repro https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3084758837/jobs/4987276953#step:4:3381
@pellared and @Kielek
I triggered verify-test: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3082544603
From some runs last week I'm suspecting that it is related to different tests somehow interfering with this one. I did multiple runs of a single test on my fork and no repro but I was able to repro on my machine with multiple runs of all tests.
The failure without output also happened again, see https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3082179256/jobs/4984629208#step:4:3378
https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3082671794/jobs/4984649326#step:4:3359
I have a strong feeling that this is a bug in the SDK. Take notice: https://github.com/open-telemetry/opentelemetry-dotnet/blob/46e6c8a06d4485a67e2a17afe10272cdb1a7b3ee/test/OpenTelemetry.Exporter.Prometheus.Tests/PrometheusCollectionManagerTests.cs#L33
https://github.com/open-telemetry/opentelemetry-dotnet/issues/3679
@pellared @Kielek from the logs captured:
[2022-09-21T02:47:08.0225320Z] [Warning] EventSource=OpenTelemetry-Sdk, Message=Measurements from Instrument 'MyFruitCounter', Meter 'MyCompany.MyProduct.MyLibrary' will be ignored. Reason: 'Instrument belongs to a Meter not subscribed by the provider.'. Suggested action: 'Use AddMeter to add the Meter to the provider.'
@pjanotti, I saw it in the log previously, but it was taken from another test. Could you please share exact execution to double check it?
@Kielek you are right: it is not from the same PID as the test failure.