opentelemetry-dotnet
opentelemetry-dotnet copied to clipboard
AOT compiled app throws System.PlatformNotSupportedException
Bug Report
List of all OpenTelemetry NuGet
packages and version that you are
using (e.g. OpenTelemetry 1.0.2
):
-
OpenTelemetry 1.7.0
-
OpenTelemetry.Exporter.OpenTelemetryProtocol 1.7.0
Runtime version (e.g. net462
, net48
, netcoreapp3.1
, net6.0
etc. You can
find this information from the *.csproj
file): net8.0
Symptom
When compiling with PublishAot=true, I get an System.PlatformNotSupportedException
from an unknown origin in the code
What is the expected behavior?
App should output without error
pre otel
post otel
What is the actual behavior?
The application would throw an exception at startup:
pre otel
Unhandled Exception: System.PlatformNotSupportedException: Operation is not supported on this platform.
at System.Runtime.InteropServices.Marshal.GetExceptionPointers() + 0x28
at OpenTelemetry.Trace.ExceptionProcessor.OnStart(Activity) + 0x20
at System.Diagnostics.SynchronizedList`1.EnumWithAction(Action`2, Object) + 0xe4
at System.Diagnostics.Activity.Start() + 0x188
at System.Diagnostics.Activity.Create(ActivitySource, String, ActivityKind, String, ActivityContext, IEnumerable`1, IEnumerable`1, DateTimeOffset, ActivityTagsCollection, ActivitySamplingRe
sult, Boolean, ActivityIdFormat, String) + 0x328
at System.Diagnostics.ActivitySource.CreateActivity(String, ActivityKind, ActivityContext, String, IEnumerable`1, IEnumerable`1, DateTimeOffset, Boolean, ActivityIdFormat) + 0x4d4
at System.Diagnostics.ActivitySource.StartActivity(String, ActivityKind) + 0x40
at Program.<Main>$(String[] args) + 0xfc
at dotnet-otel-bug!<BaseAddress>+0x239d08
[1] 80113 abort
Reproduce
Code in : https://github.com/kedare/dotnet-otel-bug/
Try without AOT, it should work fine :
dotnet publish -c Release --sc --use-current-runtime
Try with AOT, it should throw the exception after the pre otel
console output
dotnet publish -c Release --sc --use-current-runtime -p:PublishAot=true
Additional Context
Tested in MacOs (build to Mac binary) Tested inside docker (build to Linux binary)
I first though the issue was from the AddSource("*")
as then I remove it the issue is gone but I guess it's just because then it's not gathering any trace and not hitting the part of the opentelemetry code that triggers this issue)
@Yun-Ting Could you take a look?
Hi @kedare, I cloned the code you shared: https://github.com/kedare/dotnet-otel-bug/tree/main and the issue did not repro in my machine.
But I do see this warning emitted with your code:
C:\repos\dotnet-otel-bug\Program.cs(36,1): warning CS8602: Dereference of a possibly null reference. [C:\repos\dotnet-otel-bug\dotnet-otel-bug.csproj]
Could you try whether updating L36 to be
activity?.Stop();
would fix the issue?
I fixed the null safety but the issue still happens 100% of the time, I will try to make a Dockerfile to make it easier to reproduce
Hmm somehow I can't reproduce it with this sample project with Docker but it does with our full project.
Is there a way to get more information from the stacktrace from the full project about what could cause this so I can try to reproduce it on the minimal project ?
Unhandled Exception: System.PlatformNotSupportedException: Operation is not supported on this platform.
at System.Runtime.InteropServices.Marshal.GetExceptionPointers() + 0x28
at OpenTelemetry.Trace.ExceptionProcessor.OnStart(Activity) + 0x20
at OpenTelemetry.CompositeProcessor`1.OnStart(T) + 0x2c
at System.Diagnostics.SynchronizedList`1.EnumWithAction(Action`2, Object) + 0xe4
at System.Diagnostics.Activity.Start() + 0x188
at System.Diagnostics.Activity.Create(ActivitySource, String, ActivityKind, String, ActivityContext, IEnumerable`1, IEnumerable`1, DateTimeOffset, ActivityTagsCollection, ActivitySamplingResult, Boolean, ActivityIdFormat, String) + 0x328
at System.Diagnostics.ActivitySource.CreateActivity(String, ActivityKind, ActivityContext, String, IEnumerable`1, IEnumerable`1, DateTimeOffset, Boolean, ActivityIdFormat) + 0x53c
at System.Diagnostics.ActivitySource.StartActivity(String, ActivityKind) + 0x40
at Cardiologs.Pulse.ReplicaManager.Console.Program.<Main>d__6.MoveNext() + 0x338
--- End of stack trace from previous location ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
at Cardiologs.Pulse.ReplicaManager.Console.Program.<Main>(String[] args) + 0x3c
at replica-manager!<BaseAddress>+0xc8a268
It would be related to the activity start ? Or to something happening inside the activity ?
Hello there! We encountered this as well recently. I believe this is coming from the ExceptionProcessor
when it calls the following
#if NET6_0_OR_GREATER || NETFRAMEWORK
this.fnGetExceptionPointers = Marshal.GetExceptionPointers;
#else
For AOT, this currently is not supported. https://github.com/dotnet/runtime/blob/cd850233414468a8ffe84fc8fa05a36d0b3b1316/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/Marshal.NativeAot.cs#L225
public static IntPtr GetExceptionPointers()
{
throw new PlatformNotSupportedException();
}
I'm currently noodling on ways around this, but, honestly, not comfortable enough with the pointer accessors yet to recommend a solution or submit a PR. Hoping to be able to do either of those over the next few days though. Cheers!
@kedare As @Tesouro-Chris pointed, it looks like Marshal.GetExceptionPointers
is not supported for AOT. SetErrorStatusOnException
method which is used to report unhandled exceptions adds an activity processor to the SDK which calls the GetExceptionPointers
method. If you don't care about reporting unhandled exceptions, then you could remove the call to SetErrorStatusOnException
from your SDK setup to avoid this issue.
@eerhardt @vitek-karas Would you have any inputs for this issue? We probably need to annotate this method.
We probably need to annotate this method.
I was thinking the same thing. But annotating it with "RequiresDynamicCode" seems a bit confusing, since it doesn't really require "dynamic code".
Are there other ways to know if an exception is currently being handled?
It would probably be best to guard this and not call it when RuntimeFeature.IsDynamicCodeSupported == false
for now, so people aren't getting a PNSE.
cc @agocke @MichalStrehovsky
I was thinking the same thing. But annotating it with "RequiresDynamicCode" seems a bit confusing, since it doesn't really require "dynamic code".
Yeah, the right thing would be something along the lines of IsSehInteropSupported or something like that. This doesn't have to do with the ability to JIT new code at runtime, but how exceptions are implemented. CoreCLR implements exceptions on top of SEH because of C++/CLI.
I think we could make this return null based on the conversation in https://github.com/dotnet/runtime/issues/26620.
Some additional conversation at https://github.com/dotnet/runtime/pull/75669#discussion_r977670685.
Cc-ing people involved in those: @AaronRobinsonMSFT @jkoritzinsky @jkotas
I think we could make this return null
Returning null won't make https://github.com/open-telemetry/opentelemetry-dotnet/blob/af2b551316e111665af277553ccd40f18079d276/src/OpenTelemetry/Trace/ExceptionProcessor.cs#L42-L77 work correctly. I am not sure what this code is trying to do exactly, but returning null is going to make it misbehave.
I think the proper fix would be to introduce APIs that allow tracking exceptions in flight. We have AppDomain.FirstChanceException
event that is one part of it, but we do not have event for when the exception is caught.
I'm going to remove the bug label here and close this as resolved. The issue isn't part of OpenTelemetry, or one we are going to fix. It requires upstream fixes in the runtime to make that happen.
Sorry we don't have a better answer for you.
If anyone thinks that there's something that the Otel .NET library can do, please do comment and we can open an issue for tracking that work.
If anyone thinks that there's something that the Otel .NET library can do
What is this PR https://github.com/open-telemetry/opentelemetry-dotnet/pull/5374 then?
Reopened. My perception was that it wasn't a fix, just making a different, potentially better, error message. If you think it's worth keeping this open while we add a better error message, or I've misunderstood, then I don't mind it being open.
I think we should keep this open, as there is an open PR trying to address this. If we conclude in the PR that this is not something OTel .NET can fix, then yes, we can close.
Sorry we don't have a better answer for you.
No worry, thank you for your help :) I was only looking for a more explicit error message (throwing the error during setup instead of runtime is good now), I know AOT still have some limitation on this side. Hopefully this would be possible in a next version of .NET
Are you happy, with that PR merged, that we close this?