Make TraceEvent and FastSerialization AOT-compatible
This pull request introduces several improvements to enhance .NET 8 and AOT (Ahead-Of-Time) compatibility, refactors type handling for dynamic payload parsing, and adds platform-specific attributes for better code safety. The most significant change is the systematic replacement of Type with a custom FetchType abstraction in dynamic event payload parsing, which improves performance and AOT support.
.NET 8 & AOT Compatibility Enhancements:
- Added
net8.0as a target framework inFastSerialization.csprojand set<IsAotCompatible>and<PolySharpIncludeRuntimeSupportedAttributes>properties to improve AOT support. [1] [2] - Introduced the
PolySharppackage and corresponding configuration in bothDirectory.Packages.propsandFastSerialization.csprojto provide polyfills for runtime attributes and enhance compatibility across target frameworks. [1] [2]
Dynamic Payload Parsing Refactor:
- Refactored
DynamicTraceEventParserto use a newFetchTypeabstraction instead ofTypefor payload type handling, improving performance and making the code more AOT-friendly. This includes changes to array handling, type conversions, and default value retrieval via the newFetchTypeHelpersutilities. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] - Updated method signatures and logic to consistently use
FetchTypeand improved error messages for unsupported types. [1] [2]
Platform and Trimming Safety Annotations:
- Added
[SupportedOSPlatform("windows")]toMemoryMappedFileStreamReaderconstructor and[RequiresUnreferencedCode]toDirectoryAnalyzerResolverto clarify platform and trimming requirements for AOT and static analysis. [1] [2]
Source Modernization and Minor Improvements:
- Added missing
usingdirectives forSystem.Diagnostics.CodeAnalysis,System.Runtime.Versioning, and related namespaces to support new attributes and maintain code clarity. [1] [2] [3] [4] [5] - Updated usage of
Marshal.SizeOfto the genericMarshal.SizeOf<T>()for clarity and type safety.
Other Cleanups:
- Removed an unused private method from
CtfChannel.csto reduce code clutter.
These changes collectively modernize the codebase, making it more maintainable, performant, and compatible with newer .NET runtimes and deployment scenarios.
cc @hoyosjs @tommcdon I think this is needed for making dotnet-gcdump AOT-compatible
I've had a chance to look through this PR. I'd like to have a conversation about what the goal is here and have some follow-up questions on several of the changes. My initial thought is that this change brings in several things that I don't want to maintain and/or I don't believe will work for TraceEvent all-up. To preview a few things:
- I really do not want to add a new TFM for these libraries. We used to target multiple TFMs and this was much more complex for a variety of reasons.
- I will need help understanding the requirement for PolySharp and how this impacts runtime execution. PerfView embeds its own dependencies, and so we'd need to understand the impact here.
- Primitive-only deserialization will not work - This will break arbitrary trace files.
I will likely have more questions if we can resolve these issues, but these are the big rocks.
I'd like to have a conversation about what the goal is here
The main goal is to support Native AOT for a number of diagnostic tools that use this library. Most choices are downstream of that goal:
I really do not want to add a new TFM for these libraries.
There isn't really a choice here. netstandard2.0 isn't annotated for trimming/AOT, so in order to see the incompatibilities you have to multi-target.
I will need help understanding the requirement for PolySharp and how this impacts runtime execution. PerfView embeds its own dependencies, and so we'd need to understand the impact here.
Polysharp just adds attributes so we can avoid ifdef'ing every usage of RequiresUnreferencedCode or SupportedOS. It shouldn't have any runtime impact.
Primitive-only deserialization will not work - This will break arbitrary trace files.
This is the key part of a "feature switch" -- it changes behavior when intent to trim (via PublishAot/PublishTrimmed/IsTrimmable/IsAotCompatible) is enabled. This will produce different behavior when trimming, but the old behavior is categorically impossible to support when trimming, so it's kind of the best middle-ground on offer. The alternative is to simply disallow use of the API entirely, which is problematic because it is pervasive in this library.
The main goal is to support Native AOT for a number of diagnostic tools that use this library. Most choices are downstream of that goal:
I think that we should get together to discuss the goals and priority here. As it stands, I am not inclined to take this change - this puts a burden on TraceEvent and its maintainers that isn't free and creates another row in the test matrix (AOT). I think we need to understand the business need here.
@tommcdon @steveisok @noahfalk can you provide some guidance here? My understanding was that it’s a long-term goal to move the diagnostics tools to aot to eliminate the dependency on a shared runtime in diagnostics collection scenarios. Should trace event be part of that scenario, or is removing the dependency an option?
@tommcdon @steveisok @noahfalk can you provide some guidance here? My understanding was that it’s a long-term goal to move the diagnostics tools to aot to eliminate the dependency on a shared runtime in diagnostics collection scenarios. Should trace event be part of that scenario, or is removing the dependency an option?
I'd agree its a long-term goal. So far it hasn't been high enough priority that we've put any effort into it and I wasn't thinking of it as pressing. Is there anything creating urgency or could we just park this and revisit when converting the tools is a higher priority?
In terms of the dependency I think we have a mixture of:
- some tools don't use TraceEvent at all
- some tool features need potentially constrained subsets like NetTrace parsing and gcdump formatting
- some tool features like 'dotnet-trace report topN' probably have much broader code usage within TraceEvent. We could potentially cut or modify features like this to remove the dependency but that has user impacts and development cost. It would probably be hard to remove the dependency if we want to keep all the features.
I'm skeptical that there's going to be a future time where we have more time and resources to devote to this compared to now.
Can you say a bit more about what you would expect to change in the future? What new information do we need? What kind of thing would change that would shift our decision?
One more thing: almost every library in .NET multi-targets. It is a recommended strategy for most of our modern .NET scenarios and will likely continue to be popular far into the future. I don't see a reason why EventTrace would be different in this regard. What makes it special compared to every other .NET library?
I'm skeptical that there's going to be a future time where we have more time and resources to devote to this compared to now.
Can you say a bit more about what you would expect to change in the future? What new information do we need? What kind of thing would change that would shift our decision?
I largely think of it as priority and available devs to work on it. Andy, was it your plan that you were going to convert all the dotnet-* tools to NativeAOT yourself? If yes then we should try to figure this out now. If not and it fell to the diagnostics team to do the conversion then I am guessing it wouldn't happen until either:
- priority of converting the tools went up (more customer dissat about acquisition experience or some broader scenario requires it)
- other work priorities go down/get completed (SFI, engineering, multiple runtime convergence efforts, playing catchup to mandatory runtime features, new diagnostic tools and features, numerous bugs)
Of course if there is something I'm overlooking that makes the conversion more urgent/valuable than I realize do let me know. So far it just wasn't looming particularly large on my radar.
Andy, was it your plan that you were going to convert all the dotnet-* tools to NativeAOT yourself?
I was going to see how much work it would be. I started with dotnet-gcdump. Looks like this was the only blocker -- after I fixed TraceEvent I think dotnet-gcdump just works. I haven't yet looked at the rest of the tools
@noahfalk and I chatted about this offline yesterday. When I look at this change, I am most worried about it from the perspective of support cost. That's both in terms of additional development-time and testing complexity (e.g. new TFM) and behavior complexity (e.g. different behavior across different runtimes).
As a path forward, I think that I would like to step back and understand what the various options are for carving out the set of functionality that is capable of being AOT'd. My read of the current PR (I could be wrong), is that everything would be available for use in NativeAOT, though some behavior would be different. I'd like to take a different approach where we assert that all APIs available in NativeAOT have consistent behavior with non-AOT usage. If that's not possible for an API, then by default, I'd like it to not be available for usage in NativeAOT'd apps. This may or may not produce an issue for the current scenario (diagnostic tools), but I'd much rather have a conversation and find a path forward, knowing that we have API behavior consistency.
I would like to start this exercise by assuming that no APIs are available for NativeAOT, and then let's bring in the minimal set required for the diagnostic tools. As that's happening, if there are APIs that can't be supported, that's when we have a discussion about how to move forward. I can imagine potentially building new APIs that are more NativeAOT friendly so that we can be very clear about what works and what doesn't, and we'd be forcing folks that want to build NativeAOT'd apps that use TraceEvent to make changes to their code when an API that they currently use doesn't work with NativeAOT. I imagine that we have a good amount of experience doing this kind of work, and so I'm hoping that we won't be re-inventing the wheel here.
Regarding adding a new TFM, I'd like to understand what other options exist here if any. Is the issue that we're not targeting netstandard21, or just that no version of netstandard is sufficient for NativeAOT? I'm thinking that if we need to add a new TFM, I'm not sure if I prefer to just add the TFM to TraceEvent, or if I'd rather create a new DLL that contains the AOT'able subset of TraceEvent and allow folks to consume that.
I'm sure that we can find a good path forward here so that we end up in the right place. If we aren't sure how far we're going to get with converting the diagnostic tools to NativeAOT, then it might also be worth sticking with what you have for now, continuing down the path of converting the tools against a private branch of TraceEvent, and if all goes well, then we can come back to how we implement NativeAOT in TraceEvent. If that's too burdensome, we can certainly start to delve into the answers to these questions now.
OK, I've refactored this to completely remove the usage of System.Type. @noahfalk you were right. There was no dynamic usage of Type in this code and it could be completely replaced with simple switch statements.
@noahfalk @brianrob I think this is ready to go. I've narrowed any breaking change to two internal helper methods (GetInt16At and GetByteAt) that now return short and byte, respectively, instead of int. That's needed to avoid annoying reflection.
Even separate from trimming/AOT I think this will be faster and safer due to the removal of reflection.
@agocke, thanks. I will re-review this. One thing that I noticed is that this change has a very large number of whitespace-only changes. Can I ask you to revert these so that it's easier to review this? Thanks.
@agocke, thanks. I will re-review this. One thing that I noticed is that this change has a very large number of whitespace-only changes. Can I ask you to revert these so that it's easier to review this? Thanks.
github has the option to remove whitespace in the diffs. It is in the upper right hand area and toggles them.
VS Code also has a setting if you're reviewing locally: "diffEditor.ignoreTrimWhitespace": true
For some reason, GH fails to ignore the whitespace changes properly. This LGTM other than the breaking change GetByteAt/GetInt16At for object inheriting from TraceEvent. This feels pretty small though. Also, for polysharp - can we reduce the list of includeassets to build and analyzer? That way it's clear that it's all sourcegen and you don't need to bundle more assets.
Great questions!
Here are some useful resources:
- https://learn.microsoft.com/en-us/dotnet/core/deploying/trimming/trimming-concepts Conceptual doc on how trimming works
- https://learn.microsoft.com/en-us/dotnet/core/deploying/trimming/fixing-warnings Doc on fixing trim warnings
For validation, I can create a project that publishes as Native AOT and we can verify that the AOT compiler doesn't produce any warnings. Unfortunately we cannot use the existing unit tests as: 1) the existing unit tests are also likely not AOT-compatible and 2) xunit itself is not AOT-compatible.
I'm working on (2) in spare time, but that's an XXL project. Also, the static analysis is generally very reliable. In many cases we consider it more important than testing because code which is not truly compatible can sometimes work by accident.