opentelemetry-dotnet-instrumentation icon indicating copy to clipboard operation
opentelemetry-dotnet-instrumentation copied to clipboard

Flaky test: (missing spans) IntegrationTests.GraphQL.GraphQLTests.SubmitsTraces

Open pjanotti opened this issue 3 years ago • 4 comments
trafficstars

(This is not the same as #424)

3 recent failures of this test show that only 5 spans matching the test expectations, but, the mock Zipkin collector actually received 11 spans as expected. Looking at the test code it seems that the "validation" spans are not being accepted. The test received 5 spans and that matches with the expected number of "execution" spans.

  • https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3062009532/jobs/4942479088#step:4:3381
  • https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3062693118/jobs/4945192191#step:4:3382
  • https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3051076349/jobs/4919595958#step:4:3377

pjanotti avatar Sep 15 '22 21:09 pjanotti

BTW, no local repro so far for me (325 runs).

pjanotti avatar Sep 15 '22 21:09 pjanotti

No repro in 20 runs of the test via verify-test on my fork of the repo. It seems that the failure is related to running near other tests, that said, there was a small change on my private CI run that perhaps could affect the outcome https://github.com/pjanotti/opentelemetry-dotnet-instrumentation/commit/a752ce36a55f28cf4e2eeb1839d77edf01b15647

pjanotti avatar Sep 16 '22 00:09 pjanotti

note: it's interesting that zipkin collector reports receiving many spans but test doesn't see any 🤔 https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3080870008/jobs/4978741438

RassK avatar Sep 19 '22 08:09 RassK

@RassK another one: many received by zipkin by assert report not all 11 ... https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3087043165/jobs/4991999686#step:4:3378

Let's prioritize the issue to dump the spans received by Zipkin, meanwhile, you can enable the console exporter output for some initial attempt. Notice that running the single failing test multiple times hasn't be a good way to repro the issues (at least with the PrometheusExporter test). It seems that there is some side-effect/leak from previous tests that affect the tests coming after them.

pjanotti avatar Sep 20 '22 03:09 pjanotti

@RassK Still seems to be flaky https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3104529499/jobs/5029565093#step:4:2665

pellared avatar Sep 22 '22 10:09 pellared

Checked the latest CI failures and this one was not present.

pjanotti avatar Oct 04 '22 22:10 pjanotti