opentelemetry-dotnet-instrumentation icon indicating copy to clipboard operation
opentelemetry-dotnet-instrumentation copied to clipboard

Prometheus flaky test

Open rajkumar-rangaraj opened this issue 3 years ago • 11 comments
trafficstars

Creating this issue to track failure from test IntegrationTests.SmokeTests.PrometheusExporter. Yesterday there was a different exception.

Error: [xUnit.net 00:00:34.85] IntegrationTests.SmokeTests.PrometheusExporter [FAIL] 16:22:12 [ERR] [xUnit.net 00:00:34.85] IntegrationTests.SmokeTests.PrometheusExporter [FAIL] 16:22:12 [DBG] Failed IntegrationTests.SmokeTests.PrometheusExporter [31 s] 16:22:12 [DBG] Error Message: 16:22:12 [DBG] Did not expect any exceptions after 30s, but found Xunit.Sdk.XunitException with message "Expected string "# HELP process_runtime_dotnet_gc_collections_count Number of garbage collections that have occurred since process start. 16:22:12 [DBG] # TYPE process_runtime_dotnet_gc_collections_count counter 16:22:12 [DBG] process_runtime_dotnet_gc_collections_count{generation="gen2"} 0 1661271707095 16:22:12 [DBG] process_runtime_dotnet_gc_collections_count{generation="gen1"} 0 1661271707095 16:22:12 [DBG] process_runtime_dotnet_gc_collections_count{generation="gen0"} 0 1661271707095 16:22:12 [DBG] 16:22:12 [DBG] # HELP process_runtime_dotnet_assemblies_count The number of .NET assemblies that are currently loaded. 16:22:12 [DBG] # TYPE process_runtime_dotnet_assemblies_count gauge 16:22:12 [DBG] process_runtime_dotnet_assemblies_count 32 1661271707095 16:22:12 [DBG] " to contain "TYPE MyFruitCounter counter"." 16:22:12 [DBG] at FluentAssertions.Execution.XUnit2TestFramework.Throw(String message) 16:22:12 [DBG] at FluentAssertions.Execution.AssertionScope.FailWith(Func1 failReasonFunc) 16:22:12 [DBG] at FluentAssertions.Primitives.StringAssertions1.Contain(String expected, String because, Object[] becauseArgs) 16:22:12 [DBG] at IntegrationTests.SmokeTests.<>c__DisplayClass9_0.<<PrometheusExporter>b__0>d.MoveNext() in D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\test\IntegrationTests\SmokeTests.cs:line 177 16:22:12 [DBG] --- End of stack trace from previous location where exception was thrown --- 16:22:12 [DBG] at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() 16:22:12 [DBG] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) 16:22:12 [DBG] at FluentAssertions.Specialized.AsyncFunctionAssertions2.<InvokeWithInterceptionAsync>d__10.MoveNext() 16:22:12 [DBG] . 16:22:12 [DBG] Stack Trace: 16:22:12 [DBG] at FluentAssertions.Execution.XUnit2TestFramework.Throw(String message) 16:22:12 [DBG] at FluentAssertions.Execution.AssertionScope.FailWith(Func1 failReasonFunc) 16:22:12 [DBG] at FluentAssertions.Specialized.AsyncFunctionAssertions`2.<>c__DisplayClass9_0.<<NotThrowAfterAsync>g__AssertionTaskAsync|0>d.MoveNext() 16:22:12 [DBG] --- End of stack trace from previous location where exception was thrown --- 16:22:12 [DBG] at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() 16:22:12 [DBG] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) 16:22:12 [DBG] at IntegrationTests.SmokeTests.<PrometheusExporter>d__9.MoveNext() in D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\test\IntegrationTests\SmokeTests.cs:line 179 16:22:12 [DBG] --- End of stack trace from previous location where exception was thrown --- 16:22:12 [DBG] at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() 16:22:12 [DBG] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) 16:22:12 [DBG] --- End of stack trace from previous location where exception was thrown --- 16:22:12 [DBG] at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() 16:22:12 [DBG] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) 16:22:12 [DBG] --- End of stack trace from previous location where exception was thrown --- 16:22:12 [DBG] at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() 16:22:12 [DBG] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) 16:22:12 [DBG] Standard Output Messages: 16:22:12 [DBG] Platform: X64 16:22:12 [DBG] Configuration: Release 16:22:12 [DBG] TargetFramework: net462 16:22:12 [DBG] .NET Core: False

rajkumar-rangaraj avatar Aug 23 '22 16:08 rajkumar-rangaraj

It seems that we didn't have the expected counter even after 30 seconds. This seems suspicious.

pjanotti avatar Aug 23 '22 21:08 pjanotti

A few more failures of this test.

https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/pull/1234#issuecomment-1248421645

pjanotti avatar Sep 15 '22 22:09 pjanotti

CI pass failed with a different error on the same test https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3069474583/jobs/4958175590#step:4:3378

pjanotti avatar Sep 16 '22 17:09 pjanotti

I hope that it will be fixed by #1239. Please let me know if Prometheus tests will be failing again.

Kielek avatar Sep 19 '22 12:09 Kielek

I triggered verify-test: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3082544603

pellared avatar Sep 19 '22 12:09 pellared

Passed. Congratulations @Kielek 🎉 Closing

pellared avatar Sep 19 '22 13:09 pellared

Unfortunately I just got another repro https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3084758837/jobs/4987276953#step:4:3381

pjanotti avatar Sep 19 '22 18:09 pjanotti

@pellared and @Kielek

I triggered verify-test: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3082544603

From some runs last week I'm suspecting that it is related to different tests somehow interfering with this one. I did multiple runs of a single test on my fork and no repro but I was able to repro on my machine with multiple runs of all tests.

pjanotti avatar Sep 19 '22 19:09 pjanotti

The failure without output also happened again, see https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3082179256/jobs/4984629208#step:4:3378

https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3082671794/jobs/4984649326#step:4:3359

pjanotti avatar Sep 19 '22 19:09 pjanotti

I have a strong feeling that this is a bug in the SDK. Take notice: https://github.com/open-telemetry/opentelemetry-dotnet/blob/46e6c8a06d4485a67e2a17afe10272cdb1a7b3ee/test/OpenTelemetry.Exporter.Prometheus.Tests/PrometheusCollectionManagerTests.cs#L33

pellared avatar Sep 20 '22 08:09 pellared

https://github.com/open-telemetry/opentelemetry-dotnet/issues/3679

pellared avatar Sep 20 '22 09:09 pellared

@pellared @Kielek from the logs captured:

[2022-09-21T02:47:08.0225320Z] [Warning] EventSource=OpenTelemetry-Sdk, Message=Measurements from Instrument 'MyFruitCounter', Meter 'MyCompany.MyProduct.MyLibrary' will be ignored. Reason: 'Instrument belongs to a Meter not subscribed by the provider.'. Suggested action: 'Use AddMeter to add the Meter to the provider.' 

pjanotti avatar Sep 21 '22 03:09 pjanotti

@pjanotti, I saw it in the log previously, but it was taken from another test. Could you please share exact execution to double check it?

Kielek avatar Sep 21 '22 04:09 Kielek

@Kielek you are right: it is not from the same PID as the test failure.

pjanotti avatar Sep 21 '22 14:09 pjanotti