Fix `StackOverflowException` issues when an application uses custom `AssemblyLoadContext`'s and has a manual/automatic tracer version mismatch
Summary of changes
Add a new integration to insert ourselves into the AssemblyLoadContext loading process to prevent StackOverflowException's from occurring when there is a manual/automatic tracer version mismatch and the application contains custom AssemblyLoadContexts.
Reason for change
In a manual/automatic tracer version mismatch scenario, the "manual" instrumentation assembly is rewritten so that DistributedTracer.GetDistributedTracer() calls the DistributedTracer.GetDistributedTracer() method from the instrumentation assembly. This is done by injecting a method invocation using a new set of AssemblyRef/TypeRef/MemberRef tokens to the new assembly and letting the runtime resolve the assembly. However, custom AssemblyLoadContexts and assembly-resolving event handlers may force the assembly reference to resolve to itself, causing the method to call itself and eventually raise a StackOverflowException.
Implementation details
The proposed solution is to inject ourselves into AssemblyLoadContext methods to force the Datadog.Trace assembly references to resolve to the automatic instrumentation assembly. Changes include:
- Adding a new integrations to rewrite
System.Runtime.Loader.AssemblyLoadContext.ResolveUsingLoad, which is responsible for invokingAssemblyLoadContext.Load. This will interrupt all ALC's so we can return the automatic instrumentation's copy ofDatadog.Traceif the requested assembly isDatadog.Traceand it has a version less than or equal to the instrumentation's version. - Adding a new integration category
Internal - Adding an event handler for
System.Runtime.Loader.AssemblyLoadContext.Default.Resolving
Note: Assemblies loaded via AssemblyRefs and TypeRefs and loaded from the Default AssemblyLoadContext do not invoke the methods we instrument, so the application may still load a lower version of
Datadog.Traceif directly referenced from the application code.
Test coverage
Tested the following scenarios locally:
- Application running on (local) Azure Functions Runtime v3 with tracer version mismatch
- xUnit test project with tracer version mismatch
- Console application with tracer version mismatch
Other details
Remaining items:
- [ ] Ensure the
Internalintegration cannot be selectively disabled - [ ] For whatever reason, trying to rewrite
System.Runtime.Loader.AssemblyLoadContext.ResolveUsingEventcauses issues with the Azure Functions runtime. Ideally, we'd understand WHY we can't do that, before merging. - [ ] For some reason we need to add a Resolving event to the Default ALC. Make sure to understand why
- [ ] Add automated tests, not just local one-off tests
Benchmarks Report :snail:
Benchmarks for #2770 compared to master:
- All benchmarks have the same speed
- All benchmarks have the same allocations
The following thresholds were used for comparing the benchmark speeds:
- Mann–Whitney U test with statistical test for significance of 5%
- Only results indicating a difference greater than 10% and 0.3 ns are considered.
Allocation changes below 0.5% are ignored.
Benchmark details
Benchmarks.Trace.AgentWriterBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | WriteAndFlushEnrichedTraces |
net472 | 716μs | 233ns | 904ns | 0.359 | 0 | 0 | 3.16 KB |
| master | WriteAndFlushEnrichedTraces |
netcoreapp3.1 | 498μs | 242ns | 905ns | 0 | 0 | 0 | 2.57 KB |
| #2770 | WriteAndFlushEnrichedTraces |
net472 | 698μs | 302ns | 1.13μs | 0 | 0 | 0 | 3.16 KB |
| #2770 | WriteAndFlushEnrichedTraces |
netcoreapp3.1 | 499μs | 347ns | 1.25μs | 0 | 0 | 0 | 2.57 KB |
Benchmarks.Trace.AppSecBodyBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | AllCycleSimpleBody |
net472 | 194ns | 0.128ns | 0.48ns | 0.0662 | 0 | 0 | 417 B |
| master | AllCycleSimpleBody |
netcoreapp3.1 | 243ns | 0.0927ns | 0.334ns | 0.00573 | 0 | 0 | 416 B |
| master | AllCycleMoreComplexBody |
net472 | 192ns | 0.0785ns | 0.294ns | 0.0624 | 0 | 0 | 393 B |
| master | AllCycleMoreComplexBody |
netcoreapp3.1 | 242ns | 0.141ns | 0.547ns | 0.00536 | 0 | 0 | 392 B |
| master | BodyExtractorSimpleBody |
net472 | 270ns | 0.216ns | 0.837ns | 0.0572 | 0 | 0 | 361 B |
| master | BodyExtractorSimpleBody |
netcoreapp3.1 | 225ns | 0.577ns | 2.16ns | 0.00365 | 0 | 0 | 272 B |
| master | BodyExtractorMoreComplexBody |
net472 | 15.6μs | 15.5ns | 60ns | 1.21 | 0.0156 | 0 | 7.62 KB |
| master | BodyExtractorMoreComplexBody |
netcoreapp3.1 | 12.3μs | 20.8ns | 80.5ns | 0.0924 | 0 | 0 | 6.75 KB |
| #2770 | AllCycleSimpleBody |
net472 | 197ns | 0.262ns | 1.02ns | 0.0661 | 0 | 0 | 417 B |
| #2770 | AllCycleSimpleBody |
netcoreapp3.1 | 248ns | 0.139ns | 0.54ns | 0.00569 | 0 | 0 | 416 B |
| #2770 | AllCycleMoreComplexBody |
net472 | 193ns | 0.0812ns | 0.293ns | 0.0623 | 0 | 0 | 393 B |
| #2770 | AllCycleMoreComplexBody |
netcoreapp3.1 | 246ns | 0.215ns | 0.831ns | 0.00535 | 0 | 0 | 392 B |
| #2770 | BodyExtractorSimpleBody |
net472 | 274ns | 0.308ns | 1.19ns | 0.0571 | 0 | 0 | 361 B |
| #2770 | BodyExtractorSimpleBody |
netcoreapp3.1 | 232ns | 0.266ns | 0.994ns | 0.00373 | 0 | 0 | 272 B |
| #2770 | BodyExtractorMoreComplexBody |
net472 | 16.3μs | 10.1ns | 37.8ns | 1.19 | 0.0162 | 0 | 7.62 KB |
| #2770 | BodyExtractorMoreComplexBody |
netcoreapp3.1 | 12.9μs | 9.13ns | 35.4ns | 0.0904 | 0 | 0 | 6.75 KB |
Benchmarks.Trace.AspNetCoreBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | SendRequest |
net472 | 0ns | 0ns | 0ns | 0 | 0 | 0 | 0 b |
| master | SendRequest |
netcoreapp3.1 | 172μs | 54.7ns | 197ns | 0.259 | 0 | 0 | 19.74 KB |
| #2770 | SendRequest |
net472 | 0ns | 0ns | 0ns | 0 | 0 | 0 | 0 b |
| #2770 | SendRequest |
netcoreapp3.1 | 172μs | 141ns | 529ns | 0.135 | 0 | 0 | 19.74 KB |
Benchmarks.Trace.DbCommandBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | ExecuteNonQuery |
net472 | 1.42μs | 0.472ns | 1.83ns | 0.0941 | 0.000713 | 0 | 594 B |
| master | ExecuteNonQuery |
netcoreapp3.1 | 1.14μs | 4.98ns | 19.3ns | 0.00852 | 0 | 0 | 632 B |
| #2770 | ExecuteNonQuery |
net472 | 1.39μs | 0.331ns | 1.28ns | 0.0938 | 0.0007 | 0 | 594 B |
| #2770 | ExecuteNonQuery |
netcoreapp3.1 | 1.16μs | 0.437ns | 1.69ns | 0.00868 | 0 | 0 | 632 B |
Benchmarks.Trace.ElasticsearchBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | CallElasticsearch |
net472 | 2.01μs | 1.16ns | 4.49ns | 0.127 | 0 | 0 | 802 B |
| master | CallElasticsearch |
netcoreapp3.1 | 1.35μs | 0.426ns | 1.65ns | 0.0108 | 0 | 0 | 792 B |
| master | CallElasticsearchAsync |
net472 | 2.27μs | 0.667ns | 2.5ns | 0.149 | 0 | 0 | 939 B |
| master | CallElasticsearchAsync |
netcoreapp3.1 | 1.31μs | 0.596ns | 2.06ns | 0.0124 | 0 | 0 | 912 B |
| #2770 | CallElasticsearch |
net472 | 2.11μs | 1.2ns | 4.66ns | 0.125 | 0 | 0 | 802 B |
| #2770 | CallElasticsearch |
netcoreapp3.1 | 1.24μs | 1.07ns | 4.13ns | 0.0105 | 0 | 0 | 792 B |
| #2770 | CallElasticsearchAsync |
net472 | 2.15μs | 0.675ns | 2.52ns | 0.147 | 0 | 0 | 939 B |
| #2770 | CallElasticsearchAsync |
netcoreapp3.1 | 1.33μs | 0.274ns | 1.02ns | 0.0119 | 0 | 0 | 912 B |
Benchmarks.Trace.GraphQLBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | ExecuteAsync |
net472 | 2.32μs | 1.21ns | 4.68ns | 0.167 | 0 | 0 | 1.06 KB |
| master | ExecuteAsync |
netcoreapp3.1 | 1.47μs | 1.31ns | 4.9ns | 0.0139 | 0 | 0 | 1.03 KB |
| #2770 | ExecuteAsync |
net472 | 2.2μs | 1.1ns | 4.27ns | 0.166 | 0 | 0 | 1.06 KB |
| #2770 | ExecuteAsync |
netcoreapp3.1 | 1.48μs | 0.946ns | 3.54ns | 0.014 | 0 | 0 | 1.03 KB |
Benchmarks.Trace.HttpClientBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | SendAsync |
net472 | 4.88μs | 1.81ns | 6.78ns | 0.359 | 0 | 0 | 2.28 KB |
| master | SendAsync |
netcoreapp3.1 | 3.16μs | 1.87ns | 6.99ns | 0.0302 | 0 | 0 | 2.21 KB |
| #2770 | SendAsync |
net472 | 4.64μs | 1.98ns | 7.68ns | 0.357 | 0 | 0 | 2.28 KB |
| #2770 | SendAsync |
netcoreapp3.1 | 3.27μs | 1.76ns | 6.59ns | 0.0294 | 0 | 0 | 2.21 KB |
Benchmarks.Trace.ILoggerBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | EnrichedLog |
net472 | 2.73μs | 1.08ns | 3.76ns | 0.23 | 0 | 0 | 1.45 KB |
| master | EnrichedLog |
netcoreapp3.1 | 2.2μs | 1.16ns | 4.34ns | 0.0208 | 0 | 0 | 1.53 KB |
| #2770 | EnrichedLog |
net472 | 2.67μs | 1.91ns | 7.39ns | 0.228 | 0 | 0 | 1.45 KB |
| #2770 | EnrichedLog |
netcoreapp3.1 | 2.18μs | 1.49ns | 5.78ns | 0.0207 | 0 | 0 | 1.53 KB |
Benchmarks.Trace.Log4netBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | EnrichedLog |
net472 | 143μs | 127ns | 493ns | 0.641 | 0.214 | 0 | 4.29 KB |
| master | EnrichedLog |
netcoreapp3.1 | 109μs | 118ns | 459ns | 0.0551 | 0 | 0 | 4.18 KB |
| #2770 | EnrichedLog |
net472 | 142μs | 184ns | 713ns | 0.505 | 0.144 | 0 | 4.29 KB |
| #2770 | EnrichedLog |
netcoreapp3.1 | 109μs | 31.5ns | 114ns | 0.0542 | 0 | 0 | 4.18 KB |
Benchmarks.Trace.NLogBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | EnrichedLog |
net472 | 5.36μs | 1.35ns | 5.22ns | 0.51 | 0.00266 | 0 | 3.23 KB |
| master | EnrichedLog |
netcoreapp3.1 | 4.05μs | 1.39ns | 5.19ns | 0.0485 | 0 | 0 | 3.6 KB |
| #2770 | EnrichedLog |
net472 | 5.44μs | 2.25ns | 8.73ns | 0.508 | 0.00269 | 0 | 3.23 KB |
| #2770 | EnrichedLog |
netcoreapp3.1 | 4.02μs | 1.9ns | 7.34ns | 0.05 | 0 | 0 | 3.6 KB |
Benchmarks.Trace.RedisBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | SendReceive |
net472 | 1.75μs | 0.777ns | 2.8ns | 0.16 | 0 | 0 | 1.01 KB |
| master | SendReceive |
netcoreapp3.1 | 1.46μs | 0.733ns | 2.74ns | 0.0138 | 0 | 0 | 1.01 KB |
| #2770 | SendReceive |
net472 | 1.74μs | 0.843ns | 3.15ns | 0.159 | 0 | 0 | 1.01 KB |
| #2770 | SendReceive |
netcoreapp3.1 | 1.6μs | 0.392ns | 1.47ns | 0.0136 | 0 | 0 | 1.01 KB |
Benchmarks.Trace.SerilogBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | EnrichedLog |
net472 | 4.51μs | 2.32ns | 9ns | 0.294 | 0 | 0 | 1.87 KB |
| master | EnrichedLog |
netcoreapp3.1 | 3.92μs | 2.47ns | 9.25ns | 0.0195 | 0 | 0 | 1.49 KB |
| #2770 | EnrichedLog |
net472 | 4.61μs | 2.09ns | 7.81ns | 0.291 | 0 | 0 | 1.87 KB |
| #2770 | EnrichedLog |
netcoreapp3.1 | 4.04μs | 5.96ns | 23.1ns | 0.0201 | 0 | 0 | 1.49 KB |
Benchmarks.Trace.SpanBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | StartFinishSpan |
net472 | 796ns | 0.453ns | 1.7ns | 0.0723 | 0 | 0 | 457 B |
| master | StartFinishSpan |
netcoreapp3.1 | 623ns | 0.179ns | 0.692ns | 0.00626 | 0 | 0 | 456 B |
| master | StartFinishScope |
net472 | 1.04μs | 0.458ns | 1.78ns | 0.0849 | 0 | 0 | 538 B |
| master | StartFinishScope |
netcoreapp3.1 | 773ns | 0.254ns | 0.916ns | 0.0077 | 0 | 0 | 576 B |
| #2770 | StartFinishSpan |
net472 | 783ns | 0.105ns | 0.38ns | 0.0717 | 0 | 0 | 457 B |
| #2770 | StartFinishSpan |
netcoreapp3.1 | 626ns | 0.215ns | 0.776ns | 0.00624 | 0 | 0 | 456 B |
| #2770 | StartFinishScope |
net472 | 1.05μs | 1.76ns | 6.83ns | 0.0845 | 0 | 0 | 538 B |
| #2770 | StartFinishScope |
netcoreapp3.1 | 781ns | 0.211ns | 0.76ns | 0.00773 | 0 | 0 | 576 B |
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed :heavy_check_mark: Same allocations :heavy_check_mark:
Raw results
| Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| master | RunOnMethodBegin |
net472 | 1.12μs | 0.894ns | 3.46ns | 0.0849 | 0 | 0 | 538 B |
| master | RunOnMethodBegin |
netcoreapp3.1 | 925ns | 0.248ns | 0.962ns | 0.00788 | 0 | 0 | 576 B |
| #2770 | RunOnMethodBegin |
net472 | 1.12μs | 0.923ns | 3.57ns | 0.0842 | 0 | 0 | 538 B |
| #2770 | RunOnMethodBegin |
netcoreapp3.1 | 898ns | 3.93ns | 15.2ns | 0.00773 | 0 | 0 | 576 B |
Code Coverage Report :bar_chart:
:heavy_check_mark: Merging #2770 into master will not change line coverage
:heavy_check_mark: Merging #2770 into master will not change branch coverage
:no_entry: Merging #2770 into master will will increase complexity by 43
| master | #2770 | Change | |
|---|---|---|---|
| Lines | 14230 / 19505 |
14243 / 19529 |
|
| Lines % | 73% |
73% |
0% :heavy_check_mark: |
| Branches | 8275 / 11775 |
8279 / 11809 |
|
| Branches % | 70% |
70% |
0% :heavy_check_mark: |
| Complexity | 13053 |
13096 |
43 :no_entry: |
View the full report for further details:
Datadog.Trace Breakdown :heavy_check_mark:
| master | #2770 | Change | |
|---|---|---|---|
| Lines % | 73% |
73% |
0% :heavy_check_mark: |
| Branches % | 70% |
70% |
0% :heavy_check_mark: |
| Complexity | 13053 |
13096 |
43 :no_entry: |
The following classes have significant coverage changes.
| File | Line coverage change | Branch coverage change | Complexity change |
|---|---|---|---|
| Datadog.Trace.Telemetry.JsonWebRequestTelemetryTransport | -27% :no_entry: |
-57% :no_entry: |
0 :heavy_check_mark: |
| Datadog.Trace.ClrProfiler.CallTarget.CallTargetInvoker | 0% :heavy_check_mark: |
7% :heavy_check_mark: |
36 :no_entry: |
| Datadog.Trace.Ci.CIVisibility | 7% :heavy_check_mark: |
7% :heavy_check_mark: |
0 :heavy_check_mark: |
| Datadog.Trace.Telemetry.JsonHttpClientTelemetryTransport | 26% :heavy_check_mark: |
38% :heavy_check_mark: |
0 :heavy_check_mark: |
The following classes were added in #2770:
| File | Line coverage | Branch coverage | Complexity |
|---|---|---|---|
| Datadog.Trace.ClrProfiler.AutoInstrumentation.Internal.AssemblyLoadContext_Integration | 100% |
100% |
1 |
View the full reports for further details:
Closing, this is no longer needed as we found a different fix at the time