akka.net icon indicating copy to clipboard operation
akka.net copied to clipboard

Introduce System.Memory APIs into all Serializer base types

Open Aaronontheweb opened this issue 5 years ago • 6 comments

Akka.NET Version: 1.4.0 and beyond.

In an effort to radically speed up Akka.Persistence, Akka.IO, and Akka.Remote, we should take advantage of the new APIs made available in System.Memory, namely Span<T> and Memory<T> to help reduce the duplicative copying of buffers used in all of our I/O operations.

Today, here's what the situation looks like for Akka.Remote on just the write-side, for instance:

  1. Message is sent to a RemoteActorRef via the IActorRef.Tell method
  2. Message is queued up inside the Akka.Remote.EndpointWriter
  3. Message is serialized using whatever its configured serializer is: Protobuf, JSON.NET, etc - this allocates the first set of byte[]
  4. Message is then copied into its container format, i.e. wrapped inside the control messages Akka.Remote uses for routing, which are based on Google.Protobuf - allocates another set of byte[]s for copying again.
  5. The now fully-serialized message is now copied into the Akka.Remote transport, DotNetty in this case, which uses its own buffer pools and copies the buffer for the third time at least.

We have a similar process that works in reverse for deserialization and, again, copies the buffers 2-3 times. The fundamental issue with the original serialization architecture is that each library has its own idea as to how most efficiently manage memory and none of that can be easily exposed or shared to other parts of the I/O pipeline.

The introduction of the System.Memory APIs in .NET Core 2.1 changes all of this - they offer a model where a shared pool of memory can be used without any duplicative copying / buffering between the different stages of the pipeline. Akka.NET should take advantage of this in order to reduce garbage collection pressure on the system and thus, increase our total throughput in the areas of the system that use serialization heavily.

Before someone "weekend project!!!!"-s this issue, the sad news: the rest of the .NET ecosystem isn't quite ready to support this yet.

The three serialization libraries we depend on today:

  1. Google.Protobuf: https://github.com/protocolbuffers/protobuf/pull/5835 - just merged in System.Memory support 15 days ago and are planning on including it in a future release.
  2. Newstonsoft.Json: https://github.com/JamesNK/Newtonsoft.Json/issues/1761#issuecomment-408372008 - waiting on .NET Standard 2.1 / .NET Standard 3.0 to come out, which will make the System.Memory base APIs available.
  3. Hyperion: need to implement the wire format standard first, which is another hairy project. But it'd probably also make sense to wait until .NET Standard 2.1.

And lastly, DotNetty: https://github.com/Azure/DotNetty/issues/411#issuecomment-410289089 - looks like they're waiting for .NET Standard 2.1 / "more adoption" too.

I'd like to keep this thread open now to track any new developments on these issues so when the time comes for System.Memory to take on the world, we can get our work started.

Aaronontheweb avatar Mar 20 '19 15:03 Aaronontheweb

There are few notes:

  • While the Memory<byte> itself is a general concept around allocated byte buffers, actual structure, that we probably should be interested in is ReadOnlySegment<byte>: conceptually, it's a linked list of Memory<byte>. This way we can grow potential size of the payload while serializing, without paying the cost of copying the memory to bigger buffer. It's pretty much the core concept behind System.IO.Pipelines.
    • A side note here is that we already have something that conceptually works almost exactly like that: an Akka.IO.ByteString. I have an idea how to potentially make it cast-able to ReadOnlySegment<byte> with low cost (direct inheritance is not possible).
  • We should take into account the fact, that we might need to potentially operate on different APIs. We've got parts that are using ByteStrings implemented by us, ByteStrings implemented by protobuf, byte arrays, and probably in the future the newer, faster APIs. I was thinking about abstracting them into something like: void Serialize<TWriter>(object value, TWriter writer, ISerializerSession session) where TWriter: IBufferWriter, then just introduce writers (they could even be structs!) over actual data types which we want to serialize to.

Horusiath avatar Mar 20 '19 18:03 Horusiath

This way we can grow potential size of the payload while serializing, without paying the cost of copying the memory to bigger buffer.

Ah ok, that's similar to how DotNetty's framing + encoding system works. The frame headers get appended to the outbound stream as a separate set of 4 bytes, rather than modifying the payload they describe. That model seems correct here.

was thinking about abstracting them into something like: void Serialize<TWriter>(object value, TWriter writer, ISerializerSession session) where TWriter: IBufferWriter, then just introduce writers (they could even be structs!) over actual data types which we want to serialize to.

I'd like to see how things play out with the third party dependencies in the ecosystem. It'd be unfortunate if it's necessary for us to come up with our own abstraction, but it wouldn't be the first time we've had to go down that road. Maybe it won't be though - who knows!

Aaronontheweb avatar Mar 20 '19 18:03 Aaronontheweb

I imagine we will see serializers form up around the general style of this API:

public interface IFieldCodec<T>
{
    void WriteField<TBufferWriter>(ref Writer<TBufferWriter> writer, uint fieldIdDelta, Type expectedType, T value) where TBufferWriter : IBufferWriter<byte>;
    T ReadValue(ref Reader reader, Field field);
}

I use custom Writer<T> & Reader types which hold the serializer session, but the idea is the same.

In most real-world cases, TBufferWriter will not be a struct (eg, PipeWriter is an abstract class), but it's a possibility. For message serialization I currently have this:

internal interface IMessageSerializer
{
    void Write<TBufferWriter>(ref TBufferWriter writer, Message message) where TBufferWriter : IBufferWriter<byte>;
        
    /// <returns>
    /// The minimum number of bytes in <paramref name="input"/> before trying again, or 0 if a message was successfully read.
    /// </returns>
    int TryRead(ref ReadOnlySequence<byte> input, out Message message);
}

Ideally we can come up with a standard interface, but it's probably not terrible if we all land on separate interfaces with the same shape so that adaptors can be made. The most critical aspect to this is the core write/read types, TBufferWriter : IBufferWriter<byte> & ReadOnlySequence<byte>. By landing on them we can reduce impedance mismatch.

ReubenBond avatar Mar 20 '19 19:03 ReubenBond

The most critical aspect to this is the core write/read types, TBufferWriter : IBufferWriter & ReadOnlySequence. By landing on them we can reduce impedance mismatch.

Agree - if JSON.NET, Google.Protobuf, etc all end up using totally different concepts to express that idea then we'll be back at square 1.

Aaronontheweb avatar Mar 20 '19 20:03 Aaronontheweb

Related: https://github.com/akkadotnet/akka.net/pull/6026

Aaronontheweb avatar Aug 08 '22 16:08 Aaronontheweb

Some general-ish notes:

  • A return of IMemoryOwner<byte> rather than ReadOnlyMemory<byte> from ToBinary() would let consumers of the API control final disposal and/or reuse of the segment to maximize usage, while long term possibly allowing us to transition to a 'pooled' byteallocator for the lazy implementation.
  • Something for ToBinary that took an IBufferWriter<byte> would also be useful for low-alloc scenarios.
  • The DotNet CommunityToolKit has some great example implementations worth iterating on in their High Performace buffers section. It's a treasure trove of basic Span/Memory stuff including:
    • The ArrayPoolBufferWriter could be used to provide Writes where ArrayPools are used for buffers automagically.
    • The MemoryOwner stuff is nice as it gives you IMemoryOwner<T> instances backed by a shared array pool.
      • IMO it would be nice if their MemoryOwner could wrap a non-pooled array for naieve usage. I suppose a simple wrapper for that case is easy enough
    • That said are also stream adapters that provide Translations of BufferWriters and do also have some handling for memory managers that underly a Memory instance.

Tl;dr- even if we don't take it as an upstream it is a good example of patterns that will likely be useful in providing 'useful defaults' for serialization implementations.

What I do know:

  • You can with enough elbow grease adapt Newtonsoft.Json to work with ZString. Not perfectly but even a super-naieve implementation using the 'utf16' version rather than shimming the 'utf8' version showed a improvement over even the pooled stringbuilders added in #4929.
  • Protobuf, we may see some improvement but IMO the protobuf code gets very 'branchy'/'nibbly' compared to other Serializers in many cases.
    • IDK if it's a caching/instantitation thing or what but the comments I make further down (RE overhead vs JSON) do surprise even me.
  • Hyperion, honestly I've played with pooling on it, and unless the objects fit a certain tuned side I've found it hard to see a benefit, unless you are consistently sending objects large enough they hit LOH.
    • This is still a good thing to optimize for, but throwing that out there if it helps prioritize things.
  • Protobuf, Yeah BufferWriter probably.
    • I'm still rooting for a change-over to MessagePack for core remoting when(if?) prudent, Even if we keep Protobuf in other parts of akka (e.x. cluster messages)
      • This is aggressive but pragmatic; remoting as a whole is the largest 'bottleneck' in every system I've run. Especially the time it takes to pull messages off the transport, resolve the actorRef, fully deserialize, etc.
      • In the hundreds of profile runs done, Protobuf is always a bottleneck.
        • As a general point of reference, if I'm recalling correctly, the time it takes for Protobuf to deserialize our envelope can be as bad as Newtonsoft.Json deserializing the actual payload (or, at best, between the time it takes to do hyperion payload vs JSON payload.)
      • As to how far I've looked into this, the path is hot enough that while I've gotten consistent hash partitioning based on path string(or bytestring) to work, it didn't provide a notable benefit/detriment to overall throughput in a number of scenarios including multi-messaging.
  • MessagePack as a whole, IDK the last time anyone touched it but I'm sure the current version will happily accept all of this.

to11mtm avatar Aug 08 '22 22:08 to11mtm