opentelemetry-dotnet icon indicating copy to clipboard operation
opentelemetry-dotnet copied to clipboard

AOT compiled app throws System.PlatformNotSupportedException

Open kedare opened this issue 4 months ago • 15 comments

Bug Report

List of all OpenTelemetry NuGet packages and version that you are using (e.g. OpenTelemetry 1.0.2):

  • OpenTelemetry 1.7.0
  • OpenTelemetry.Exporter.OpenTelemetryProtocol 1.7.0

Runtime version (e.g. net462, net48, netcoreapp3.1, net6.0 etc. You can find this information from the *.csproj file): net8.0

Symptom

When compiling with PublishAot=true, I get an System.PlatformNotSupportedException from an unknown origin in the code

What is the expected behavior?

App should output without error

pre otel
post otel

What is the actual behavior?

The application would throw an exception at startup:

pre otel
Unhandled Exception: System.PlatformNotSupportedException: Operation is not supported on this platform.
   at System.Runtime.InteropServices.Marshal.GetExceptionPointers() + 0x28
   at OpenTelemetry.Trace.ExceptionProcessor.OnStart(Activity) + 0x20
   at System.Diagnostics.SynchronizedList`1.EnumWithAction(Action`2, Object) + 0xe4
   at System.Diagnostics.Activity.Start() + 0x188
   at System.Diagnostics.Activity.Create(ActivitySource, String, ActivityKind, String, ActivityContext, IEnumerable`1, IEnumerable`1, DateTimeOffset, ActivityTagsCollection, ActivitySamplingRe
sult, Boolean, ActivityIdFormat, String) + 0x328
   at System.Diagnostics.ActivitySource.CreateActivity(String, ActivityKind, ActivityContext, String, IEnumerable`1, IEnumerable`1, DateTimeOffset, Boolean, ActivityIdFormat) + 0x4d4
   at System.Diagnostics.ActivitySource.StartActivity(String, ActivityKind) + 0x40
   at Program.<Main>$(String[] args) + 0xfc
   at dotnet-otel-bug!<BaseAddress>+0x239d08
[1]    80113 abort

Reproduce

Code in : https://github.com/kedare/dotnet-otel-bug/

Try without AOT, it should work fine :

dotnet publish -c Release --sc --use-current-runtime

Try with AOT, it should throw the exception after the pre otel console output

dotnet publish -c Release --sc --use-current-runtime -p:PublishAot=true

Additional Context

Tested in MacOs (build to Mac binary) Tested inside docker (build to Linux binary)

I first though the issue was from the AddSource("*") as then I remove it the issue is gone but I guess it's just because then it's not gathering any trace and not hitting the part of the opentelemetry code that triggers this issue)

kedare avatar Feb 14 '24 10:02 kedare

@Yun-Ting Could you take a look?

vishweshbankwar avatar Feb 14 '24 16:02 vishweshbankwar

Hi @kedare, I cloned the code you shared: https://github.com/kedare/dotnet-otel-bug/tree/main and the issue did not repro in my machine.

But I do see this warning emitted with your code: C:\repos\dotnet-otel-bug\Program.cs(36,1): warning CS8602: Dereference of a possibly null reference. [C:\repos\dotnet-otel-bug\dotnet-otel-bug.csproj]

Could you try whether updating L36 to be activity?.Stop(); would fix the issue?

Yun-Ting avatar Feb 14 '24 20:02 Yun-Ting

I fixed the null safety but the issue still happens 100% of the time, I will try to make a Dockerfile to make it easier to reproduce

kedare avatar Feb 16 '24 15:02 kedare

Hmm somehow I can't reproduce it with this sample project with Docker but it does with our full project.

Is there a way to get more information from the stacktrace from the full project about what could cause this so I can try to reproduce it on the minimal project ?

Unhandled Exception: System.PlatformNotSupportedException: Operation is not supported on this platform.
   at System.Runtime.InteropServices.Marshal.GetExceptionPointers() + 0x28
   at OpenTelemetry.Trace.ExceptionProcessor.OnStart(Activity) + 0x20
   at OpenTelemetry.CompositeProcessor`1.OnStart(T) + 0x2c
   at System.Diagnostics.SynchronizedList`1.EnumWithAction(Action`2, Object) + 0xe4
   at System.Diagnostics.Activity.Start() + 0x188
   at System.Diagnostics.Activity.Create(ActivitySource, String, ActivityKind, String, ActivityContext, IEnumerable`1, IEnumerable`1, DateTimeOffset, ActivityTagsCollection, ActivitySamplingResult, Boolean, ActivityIdFormat, String) + 0x328
   at System.Diagnostics.ActivitySource.CreateActivity(String, ActivityKind, ActivityContext, String, IEnumerable`1, IEnumerable`1, DateTimeOffset, Boolean, ActivityIdFormat) + 0x53c
   at System.Diagnostics.ActivitySource.StartActivity(String, ActivityKind) + 0x40
   at Cardiologs.Pulse.ReplicaManager.Console.Program.<Main>d__6.MoveNext() + 0x338
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at Cardiologs.Pulse.ReplicaManager.Console.Program.<Main>(String[] args) + 0x3c
   at replica-manager!<BaseAddress>+0xc8a268

It would be related to the activity start ? Or to something happening inside the activity ?

kedare avatar Feb 16 '24 15:02 kedare

Hello there! We encountered this as well recently. I believe this is coming from the ExceptionProcessor when it calls the following

#if NET6_0_OR_GREATER || NETFRAMEWORK
        this.fnGetExceptionPointers = Marshal.GetExceptionPointers;
#else

For AOT, this currently is not supported. https://github.com/dotnet/runtime/blob/cd850233414468a8ffe84fc8fa05a36d0b3b1316/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/Marshal.NativeAot.cs#L225

public static IntPtr GetExceptionPointers()
        {
            throw new PlatformNotSupportedException();
        }

I'm currently noodling on ways around this, but, honestly, not comfortable enough with the pointer accessors yet to recommend a solution or submit a PR. Hoping to be able to do either of those over the next few days though. Cheers!

Tesouro-Chris avatar Feb 16 '24 21:02 Tesouro-Chris

@kedare As @Tesouro-Chris pointed, it looks like Marshal.GetExceptionPointers is not supported for AOT. SetErrorStatusOnException method which is used to report unhandled exceptions adds an activity processor to the SDK which calls the GetExceptionPointers method. If you don't care about reporting unhandled exceptions, then you could remove the call to SetErrorStatusOnException from your SDK setup to avoid this issue.

@eerhardt @vitek-karas Would you have any inputs for this issue? We probably need to annotate this method.

utpilla avatar Feb 17 '24 00:02 utpilla

We probably need to annotate this method.

I was thinking the same thing. But annotating it with "RequiresDynamicCode" seems a bit confusing, since it doesn't really require "dynamic code".

Are there other ways to know if an exception is currently being handled?

It would probably be best to guard this and not call it when RuntimeFeature.IsDynamicCodeSupported == false for now, so people aren't getting a PNSE.

cc @agocke @MichalStrehovsky

eerhardt avatar Feb 20 '24 19:02 eerhardt

I was thinking the same thing. But annotating it with "RequiresDynamicCode" seems a bit confusing, since it doesn't really require "dynamic code".

Yeah, the right thing would be something along the lines of IsSehInteropSupported or something like that. This doesn't have to do with the ability to JIT new code at runtime, but how exceptions are implemented. CoreCLR implements exceptions on top of SEH because of C++/CLI.

I think we could make this return null based on the conversation in https://github.com/dotnet/runtime/issues/26620.

Some additional conversation at https://github.com/dotnet/runtime/pull/75669#discussion_r977670685.

Cc-ing people involved in those: @AaronRobinsonMSFT @jkoritzinsky @jkotas

MichalStrehovsky avatar Feb 20 '24 20:02 MichalStrehovsky

I think we could make this return null

Returning null won't make https://github.com/open-telemetry/opentelemetry-dotnet/blob/af2b551316e111665af277553ccd40f18079d276/src/OpenTelemetry/Trace/ExceptionProcessor.cs#L42-L77 work correctly. I am not sure what this code is trying to do exactly, but returning null is going to make it misbehave.

I think the proper fix would be to introduce APIs that allow tracking exceptions in flight. We have AppDomain.FirstChanceException event that is one part of it, but we do not have event for when the exception is caught.

jkotas avatar Feb 20 '24 20:02 jkotas

I'm going to remove the bug label here and close this as resolved. The issue isn't part of OpenTelemetry, or one we are going to fix. It requires upstream fixes in the runtime to make that happen.

Sorry we don't have a better answer for you.

If anyone thinks that there's something that the Otel .NET library can do, please do comment and we can open an issue for tracking that work.

martinjt avatar Feb 29 '24 17:02 martinjt

If anyone thinks that there's something that the Otel .NET library can do

What is this PR https://github.com/open-telemetry/opentelemetry-dotnet/pull/5374 then?

cijothomas avatar Feb 29 '24 18:02 cijothomas

Reopened. My perception was that it wasn't a fix, just making a different, potentially better, error message. If you think it's worth keeping this open while we add a better error message, or I've misunderstood, then I don't mind it being open.

martinjt avatar Feb 29 '24 18:02 martinjt

I think we should keep this open, as there is an open PR trying to address this. If we conclude in the PR that this is not something OTel .NET can fix, then yes, we can close.

cijothomas avatar Feb 29 '24 18:02 cijothomas

Sorry we don't have a better answer for you.

No worry, thank you for your help :) I was only looking for a more explicit error message (throwing the error during setup instead of runtime is good now), I know AOT still have some limitation on this side. Hopefully this would be possible in a next version of .NET

kedare avatar Mar 27 '24 17:03 kedare

Are you happy, with that PR merged, that we close this?

martinjt avatar Mar 27 '24 17:03 martinjt