apm-agent-dotnet icon indicating copy to clipboard operation
apm-agent-dotnet copied to clipboard

[BUG] Allow Elastic.Apm.Profiler.Managed to be loaded domain-neutral

Open russcam opened this issue 2 years ago • 0 comments

How assemblies are loaded is defined by the runtime host when it loads the runtime into a process. In some scenarios where there can be multiple AppDomains in a process, such as multiple ASP.NET applications running in IIS and sharing an Application Pool, assemblies can be loaded as domain-neutral, to allow them to be shared across AppDomains, which typically happens for common assemblies loaded from the Global Assembly Cache (GAC). Domain-neutral assemblies are loaded into the App Domain named EE Shared Assembly Repository in CLR 4, which will then share these assemblies with other AppDomains in the process.

When a profiler auto-instrumentation targets an assembly that is loaded domain-neutral, the assembly containing the instrumentation, Elastic.Apm.Profiler.Managed, must also be loaded domain-neutral, since IL rewriting performed by the profiler inserts calls to methods contained within Elastic.Apm.Profiler.Managed. The general rule is- if an assembly is loaded domain-neutral, all of its dependencies must be loaded domain-neutral.

The runtime's loading decision can be influenced by implementing ICorProfilerCallback6::GetAssemblyReferences, to tell the runtime that an assembly reference will be added to the metadata of the assembly being loaded, at a later point in time. The runtime can then use this information to determine how to load the assembly. The Elastic APM profiler implements ICorProfilerCallback6::GetAssemblyReferences to add an assembly reference to Elastic.Apm.Profiler.Managed for every assembly it is called for, except those on skip lists. The desired outcome is for Elastic.Apm.Profiler.Managed to be loaded domain-neutral so that it can instrument assemblies that are loaded domain-neutral.

A problem arises in this approach in that it appears. Elastic.Apm.Profiler.Managed cannot be loaded as domain-neutral because of one of its dependencies, including transient dependencies. My suspicion is that it might be related to Elastic.Apm's use of HttpClient in net461, which is defined in netstandard.dll.

Current investigation

Setup

  1. Run AspNetFullFrameworkSampleApp in local IIS, in the Default Application Pool.

  2. Ensure AspNetFullFrameworkSampleApp is the only application running in the Default Application Pool.

  3. Configure profiler auto instrumentation by setting the following environment variables for the Default Application Pool (only possible in IIS 10+):

    COR_ENABLE_PROFILING="1"
    COR_PROFILER_PATH="<path-to-repo>\target\debug\elastic_apm_profiler.dll" />
    COR_PROFILER="{FA65FE15-F085-4681-9B20-95E04F6C03CC}"
    
    ELASTIC_APM_PROFILER_HOME="<path-to-repo>\src\Elastic.Apm.Profiler.Managed\bin\Release"
    ELASTIC_APM_PROFILER_INTEGRATIONS="<path-to-repo>\src\Elastic.Apm.Profiler.Managed\integrations.yml"
    ELASTIC_APM_PROFILER_LOG_DIR="<path-to-repo>\logs"
    ELASTIC_APM_PROFILER_LOG="trace"
    ELASTIC_APM_PROFILER_LOG_IL="1"
    
  4. Run the application

  5. The default page opens successfully

  6. Hit the /Database page to trigger the instrumentation of System.Data.SQLite

  7. Observe a FileNotFoundException is thrown

    Server Error in '/AspNetFullFrameworkSampleApp' Application.
    
    Could not load file or assembly 'Elastic.Apm.Profiler.Managed, Version=1.11.0.0, Culture=neutral, PublicKeyToken=ae7400d2c189cf22' or one of its dependencies. The system cannot find the file specified.
    

Evaluation

The log files captured in <path-to-repo>\logs provide details of what happens.

In Elastic.Apm.Profiler.Managed.Loader*.log, the Elastic.Apm.Profiler.Managed assembly fails to be loaded as domain neutral by the Elastic.Apm.Profiler.Managed.Loader assembly shim

[2021-09-07T16:04:49.0383054+10:00] [ERROR] Error loading managed assemblies.
System.IO.FileNotFoundException: Could not load file or assembly 'Elastic.Apm.Profiler.Managed, Version=1.11.0.0, Culture=neutral, PublicKeyToken=ae7400d2c189cf22' or one of its dependencies. The system cannot find the file specified.
File name: 'Elastic.Apm.Profiler.Managed, Version=1.11.0.0, Culture=neutral, PublicKeyToken=ae7400d2c189cf22'
   at System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
   at System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks)
   at System.Reflection.RuntimeAssembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean forIntrospection)
   at System.Reflection.RuntimeAssembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection)
   at System.Reflection.Assembly.Load(String assemblyString)
   at Elastic.Apm.Profiler.Managed.Loader.Startup.TryLoadManagedAssembly()

System.Reflection.RuntimeAssembly._nLoad is a internal call. I believe the FileNotFoundException might be misleading, and not the real underlying cause for the exception, because the elastic_apm_profiler*.log indicates that the Elastic.Apm.Profiler.Managed assembly does get loaded, but it is loaded into the local AppDomain, not the domain-neutral AppDomain. Because it is loaded into the local App Domain and not the domain-neutral AppDomain, it won't be able to instrument System.Data.SQLite without being loaded into the domain-neutral AppDomain. However, the grant permissions of Elastic.Apm.Profiler.Managed when loaded into the domain-neutral AppDomain would need to match those of it being loaded into the local AppDomain, which I believe is the underlying issue for the error log. Further evidence to support this is that if Startup.TryLoadManagedAssembly is changed to explicitly load the Elastic.Apm.Profiler.Managed assembly that we know exists on disk, a FileNotFoundException is still thrown when hitting the /Database endpoint. The fundamental question though is why Elastic.Apm.Profiler.Managed assembly is loaded into the local AppDomain to begin with, and not the domain-neutral AppDomain. My suspicion is that it might be related to Elastic.Apm's use of HttpClient in net461, which is defined in netstandard.dll. Investigation requires removing this dependency from net461, by using HttpWebRequest instead of HttpClient, etc.

Intermediate Workaround

Assemblies can be forced not to be loaded as domain-neutral, by using LoaderOptimization.SingleDomain. This can be achieved with

Environment variable (preferable option)

COMPlus_LoaderOptimization=1

or

Registry settings (less preferable option)

HKEY_LOCAL_MACHINE\Software\Microsoft\.NETFramework create DWORD with value 1 HKEY_LOCAL_MACHINE\Software\WOW6432Node\Microsoft\.NETFramework create DWORD with value 1

This solves this issue altogether.

This setting has no effect on mscorlib, which is always loaded domain-neutral, so if mscorlib is a target for instrumentation, this workaround will not work. Another downside to this workaround is that if there are many AppDomains running in a process, which can commonly be the case with multiple applications using the same Application Pool in IIS, assemblies will not be shared, meaning each AppDomain will consume more memory and resources.

russcam avatar Nov 11 '21 09:11 russcam