opentelemetry-dotnet-instrumentation icon indicating copy to clipboard operation
opentelemetry-dotnet-instrumentation copied to clipboard

Flaky test: IntegrationTests.StrongNamedTests.SubmitsTraces

Open pjanotti opened this issue 3 years ago • 1 comments
trafficstars

4 instances:

  • https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3069538719/jobs/4958307009#step:4:2590
  • https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3069538719/jobs/4958306796#step:4:2682
  • https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3069538719/jobs/4958306876#step:4:3392
  • https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3062693118/jobs/4943951142#step:4:3392

pjanotti avatar Sep 16 '22 17:09 pjanotti

Confirming that is actually flakyness: alpine, macos, and windows passed on retry on PR #1234

pjanotti avatar Sep 16 '22 17:09 pjanotti

Some span is captured but it does not have the expected attribute: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3097103461/jobs/5013412328

pellared avatar Sep 21 '22 10:09 pellared

It seems that the issue is related to the ReJIT happening asynchronous. In a failed pass of the test we have:

[2022-09-22T02:47:31.803708579Z] [3580|3580] [debug] Enqueue for ReJIT [ModuleId=140150729282504, MethodDef=06000001, AppDomainId=94699831388896, IsDomainNeutral=0, Assembly=TestLibrary.InstrumentationTarget, Type=TestLibrary.InstrumentationTarget.Command, Method=Execute, Signature=200001]
[2022-09-22T02:47:31.803725179Z] [3580|3580] [debug] ModuleLoadFinished stored metadata for 140150729282504 TestLibrary.InstrumentationTarget AppDomain 94699831388896 [clrhost]
[2022-09-22T02:47:31.803728679Z] [3580|3580] [debug] AssemblyLoadFinished: 94699845728400 0
[2022-09-22T02:47:31.803896979Z] [3580|3580] [debug] AssemblyLoadStarted: 94699845775072
[2022-09-22T02:47:31.803910079Z] [3580|3580] [debug] ModuleLoadStarted: 140150729284448
[2022-09-22T02:47:31.803947279Z] [3580|3580] [debug] ModuleLoadFinished: 140150729284448 TestApplication.Shared AppDomain 94699831388896 [clrhost]  | IsNGEN = false | IsDynamic = false | IsResource = false
[2022-09-22T02:47:31.804019179Z] [3580|3580] [debug] ModuleLoadFinished stored metadata for 140150729284448 TestApplication.Shared AppDomain 94699831388896 [clrhost]
[2022-09-22T02:47:31.804022379Z] [3580|3580] [debug] AssemblyLoadFinished: 94699845775072 0
[2022-09-22T02:47:31.804504578Z] [3580|3593] [info] Request ReJIT done for 1 methods

but there is no completion of the actul ReJIT, for example see an example of a successful run of the test:

[2022-09-22T03:03:24.426651900Z] [256760|195716] [debug] Enqueue for ReJIT [ModuleId=140704364384120, MethodDef=06000001, AppDomainId=2472867908960, IsDomainNeutral=0, Assembly=TestLibrary.InstrumentationTarget, Type=TestLibrary.InstrumentationTarget.Command, Method=Execute, Signature=200001]
[2022-09-22T03:03:24.426692800Z] [256760|195716] [debug] ModuleLoadFinished stored metadata for 140704364384120 TestLibrary.InstrumentationTarget AppDomain 2472867908960 [TestApplication.StrongNamed.exe]
[2022-09-22T03:03:24.426706600Z] [256760|195716] [debug] AssemblyLoadFinished: 2473312560592 0
[2022-09-22T03:03:24.427114000Z] [256760|259340] [info] Request ReJIT done for 1 methods
...
[2022-09-22T03:03:28.560054100Z] [256760|195716] [debug] GetReJITParameters: [moduleId: 140704364384120, methodId: 100663297]
[2022-09-22T03:03:28.560427700Z] [256760|195716] [debug] *** CallTarget_RewriterCallback() Start: TestLibrary.InstrumentationTarget.Command.Execute() [IsVoid=1, IsStatic=0, IntegrationType=OpenTelemetry.AutoInstrumentation.Instrumentations.Validations.StrongNamedValidation, Arguments=0]
[2022-09-22T03:03:28.560529800Z] [256760|195716] [debug] CallTargetTokens::EnsureBaseCalltargetTokens() Bytecode Instrumentation Assembly: OpenTelemetry.AutoInstrumentation, Version=0.3.1.0, Culture=neutral, PublicKeyToken=c0db600a13f60b51
[2022-09-22T03:03:28.560589800Z] [256760|195716] [debug] Caller Type.Id: 02000002
[2022-09-22T03:03:28.560598300Z] [256760|195716] [debug] Caller Type.IsGeneric: 0
[2022-09-22T03:03:28.560602600Z] [256760|195716] [debug] Caller Type.IsValid: 1
[2022-09-22T03:03:28.560607500Z] [256760|195716] [debug] Caller Type.Name: TestLibrary.InstrumentationTarget.Command
[2022-09-22T03:03:28.560612400Z] [256760|195716] [debug] Caller Type.TokenType: 33554432
[2022-09-22T03:03:28.560616200Z] [256760|195716] [debug] Caller Type.Spec: 0000001b
[2022-09-22T03:03:28.560620400Z] [256760|195716] [debug] Caller Type.ValueType: 0
[2022-09-22T03:03:28.560624500Z] [256760|195716] [debug] Caller Type Extend From.Id: 0c000001
[2022-09-22T03:03:28.560628900Z] [256760|195716] [debug] Caller Type Extend From.IsGeneric: 0
[2022-09-22T03:03:28.560633700Z] [256760|195716] [debug] Caller Type Extend From.IsValid: 1
[2022-09-22T03:03:28.560638000Z] [256760|195716] [debug] Caller Type Extend From.Name: System.Object
[2022-09-22T03:03:28.560643300Z] [256760|195716] [debug] Caller Type Extend From.TokenType: 16777216
[2022-09-22T03:03:28.560647600Z] [256760|195716] [debug] Caller Type Extend From.Spec: 0000001b
[2022-09-22T03:03:28.560652600Z] [256760|195716] [debug] Caller Type Extend From.ValueType: 0
[2022-09-22T03:03:28.560707000Z] [256760|195716] [info] *** CallTarget_RewriterCallback() Finished: TestLibrary.InstrumentationTarget.Command.Execute() [IsVoid=1, IsStatic=0, IntegrationType=OpenTelemetry.AutoInstrumentation.Instrumentations.Validations.StrongNamedValidation, Arguments=0]
[2022-09-22T03:03:28.560836300Z] [256760|195716] [debug] ReJITCompilationStarted: [functionId: 140704364385784, rejitId: 2, safeToBlock: 1]
[2022-09-22T03:03:28.588680300Z] [256760|195716] [debug] ReJITCompilationFinished: [functionId: 140704364385784, rejitId: 2, hrStatus: 0, safeToBlock: 1]

This seems to indicate that we need to ensure that the ReJIT completes before code to be instrumented is executed.

pjanotti avatar Sep 22 '22 03:09 pjanotti

This seems to indicate that we need to ensure that the ReJIT completes before code to be instrumented is executed.

In other words, the startup time of a lightweight application might be too fast in time to time?

RassK avatar Sep 22 '22 06:09 RassK

I simple modification to request the ReJIT in a synchronous fashion (directly on the thread handling ModuleLoadedFinsih) fixes the issue. However, I guess that this will have some perf impact, which, I didn't measure yet.

pjanotti avatar Sep 24 '22 03:09 pjanotti

If the impact will be noticeable we could introduce an env var to control it. However, I would rather not do it as it is just about longer initialization.

Probably this one failed for the same reason https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/actions/runs/3128880953/jobs/5077270798

pellared avatar Sep 26 '22 15:09 pellared

Yeah, I don't want to give the option to "perhaps a little fast" but some non-deterministic observability results. It seems that we have to offer a single and consistent way of instrumenting.

pjanotti avatar Sep 26 '22 19:09 pjanotti