akka.net
akka.net copied to clipboard
Epic: Source-Generated Serialization
Abstract
There are two major performance issues that affect Akka.NET overall:
- Using a single TCP socket in remoting (will be solved via #4436 and #4757)
- Using 2013-15 style serialization: working with
byte[], allowing every serializer to have its own idea on how to allocate memory, reflection, and lots of redundant copying for envelope types (such asRemotingEnvelope,DDataEnvelope, and many others.)
This epic is about addressing issue 2 - the serialization system. There have been many good proposals on how to do this already, such as:
- https://github.com/akkadotnet/akka.net/issues/3740
- https://github.com/akkadotnet/akka.net/issues/6060
- https://github.com/akkadotnet/akka.net/issues/6059
These are great ideas for making the serialization system faster - but, what this does not address are the following:
- Writing custom serializers in Akka.NET is peak tedium and is generally not pleasant to do.
- Default serialization in Akka.NET using Newtonsoft.Json, which is ancient and not supported long-term. It's not coming with us to high-performance land.
- Reflection-based polymorphic serialization is not 100% secure, in addition to being slow and bug-prone - you have to add features such as https://github.com/akkadotnet/akka.net/issues/5026, which we did for Hyperion on https://github.com/akkadotnet/akka.net/pull/5208. Schema-based serialization is the secure option here.
- Replacing default serialization so users can kick the tires on Akka.Remote without having to manually write a serializer is a must-have for people trying to use the framework. Having Akka.Remote "just work" on the first try is a magical experience that is actually pretty important to keeping Akka.NET users happy and engaged.
- Finally, we have a new requirement: https://github.com/akkadotnet/akka.net/issues/7246 - AOT support. Reflection-based serialization is a no-no and will never be supported without some type of manual schema.
Given all of this - there's a clear solution that solves all the problems at once: compile-time generated serialization.
Requirements
I'm going to break our requirements out into two areas - mandatory and "nice to have"
Mandatory
- All message definitions must be explicitly marked with an interface or attribute indicating that they're intended for remote or persistent serialization. This is the marker the generator is going to look for.
- The generator will fill in the following attributes or methods on source-generated serializer / message types: a size estimator, a writer method for writing to a
System.Memory<byte>/ whatever, and a reader for returning the original typeT. The size estimator is actually the most important piece for performance reasons - this is what will allow memory pooling to work efficiently. We use this technique very successfully inside TurboMqtt: https://www.youtube.com/watch?v=owTeEYqi0AM&t=1002s (skips to 18:26) - Serializable types will be organized into a
SerializerV2classes that are then generated on a per-assembly basis, which usesSystem.Memory<T>constructs as its primary signature. - Registrations for using the custom serializer will be generated using either Akka.Hosting or an
ActorSystemSetup- the user might have to manually pass these in order to work as that might be a bridge too far for the serializer definition. We'll see what we can do automatically, but if we're trying to be AOT friendly that means reflection-based type loading for the serializer itself (which is how we would do it auto-magically, typically) might be a no-no as well. - All current and existing serializers will be wrapped inside a
SerializerV2Adapterand made backwards compatible - this is 10000000% necessary in order to prevent bricking historical data inside Akka.Persistence AND it's also necessary for people who are already using custom serialization to have some backwards compatibility.
Stretch Goals
- I would love to have some degree of automatic detection and enforcement of extend-only design: https://aaronstannard.com/extend-only-design/ - this will stop developers from "having to know" to preserve this practice and will instead force them to do battle with the compiler. This will require the source generator to have some prior knowledge of what the code looked like. Probably tough to do.
- I would love to do native code emission for F# if possible but the state of the Roslyn toolchain does not give me high hopes that this will be feasible.
- Platform support - it's an open question whether or not we're going to drop .NET Standard support entirely in v1.6. I'd love to keep all of this and have it work in .NET Standard 2.0 / 2.1, but if we have to drop it (and non-.NET SDK projects, which required us to lower our Roslyn target recently for Akka.Analyzers https://github.com/akkadotnet/akka.net/issues/7307)
Approach
Given the phases of code generation I've described here, the relatively new Incremental Source Generators from Roslyn sounds like the most promising way to accomplish this.
We've been building our muscle working with Roslyn on the https://github.com/akkadotnet/akka.analyzers project over the past year, partly because we knew we'd be headed down this road for v1.6. That's given us a lot of practical experience on how to keep a Roslyn project organized / versioned / tested.
I think that's the route we're going to take, unless a better / cheaper / faster alternative appears.
Implementation
- [ ] Design the message-specific serialization stub / interface - this is 80% of the output from the code generator. What should this look like?
- [ ] https://github.com/akkadotnet/akka.net/issues/3740 - need to design some serializer APIs that leverage the stub.
- [ ] Have a serializerId / manifest generation system that ensures non-collision between serializers in the same project.
- [ ] Write the stub generator
- [ ] Write the serializer generator
- [ ] Write the serializer configuration generator
There's going to be a billion edge cases and lots of whacky garbage users do that needs help ("oh no, my double upside down partial static abstract protected internal discriminated union that uses custom IReadOnlyCollection<T> implementations that are actually mutable doesn't serialize Exceptions correctly!") - we'll deal with that as best we can.
The most important requirement we have to observe is not allowing the serializer to rick-roll itself on successive runs - i.e. the serializer ids have to remain stable and constant. Hell, maybe we force the user to specify that in order to take the computer out of the equation as a potential problem.
Previous AOT proof of concept: https://github.com/akkadotnet/akka.net/pull/6904