orleans icon indicating copy to clipboard operation
orleans copied to clipboard

Question about copying on serialization

Open bill-poole opened this issue 1 year ago • 8 comments

The docs state that the reason why a defensive copy is made of method arguments (unless explicitly marked immutable) is:

  1. so that they are not mutated while being serialized; and
  2. so that if the target grain is in the same silo, the message doesn't have to be serialized/deserialized - the target grain can just be given a direct reference to the copy.

The docs also say that if a message/parameter is marked immutable, then it cannot be changed by either the sender or receiver at any time after the message/parameter has been passed to the grain method on the sending grain. However, presumably Orleans doesn't retain a reference to any sent message/parameter once the send operation has completed and the Task returned by the grain method on the sending grain has completed? Likewise, I wouldn't have thought Orleans would retain a reference to a message/parameter on the receiving grain after the message/parameter has been passed to the grain method implementation on the receiving grain.

Is this correct? If so, I believe better default behavior would be:

  • take a defensive copy of messages being sent to grains in the same silo (i.e., those that are not serialized/deserialized to/from a channel) that are not marked as immutable; and
  • do not take a defensive copy of messages being sent to grains in different silos.

The issue/risk of message objects being mutated while serialized (i.e., before the Task returned from the grain proxy method is completed) is the same with all async serializers. For example, the JsonSerializer.SerializeAsync method does not (to my knowledge) take a defensive copy of the object to be serialized. It just requires that the given object is not mutated while it is being serialized. I therefore believe this should be sufficient precedent for Orleans to only take defensive copies of messages being sent to target grains in the same silo.

bill-poole avatar Sep 15 '22 05:09 bill-poole

We cannot synchronously determine whether a message is going to be sent to the same silo or another silo: that can require a lookup which is further down the pipeline and is naturally asynchronous.

ReubenBond avatar Sep 15 '22 20:09 ReubenBond

Okay, but the defensive copy isn't needed until the target grain method is invoked, unless the goal is to protect against the message object being mutated before the Task returned from the calling grain's proxy is completed. And the target grain method cannot be invoked until after Orleans has done the lookup to determine which silo contains the target grain.

Therefore, if we assume/require that calling grains do not mutate parameters passed to a target grain method until after the Task returned from that method completes (which is the position taken by the JsonSerializer.SerializeAsync method because it doesn't take a defensive copy), then there is no need for a defensive copy unless the lookup determines the target grain is in the same silo and if so, only after that lookup completes.

That is, I think it is an edge case for a calling grain to mutate a message passed to a target grain method before the Task returned from that method completes. If we take the position that it is the calling grain's responsibility in the normal/default scenario not to mutate message objects passed to a target grain method until the Task returned from that method completes, then the defensive copy can be taken if/after Orleans determines the target grain is in the same silo as the calling grain.

Consider an application that recycles sent messages using an object pool to reduce GC pressure. Those message objects are immutable while they are being sent - i.e., until the Task returned from the target grain method completes. However, after this point we want to recycle the message object. So, we want Orleans to take a defensive copy of the object if (and only if) the target grain is in the same silo. But if the target grain is in a different silo and the message will be serialized over the wire, then the defensive copy is wasteful/inefficient (because the calling grain doesn't mutate the message until after the Task returned from the target grain method is complete).

Orleans allows us to say whether a defensive copy will happen in all circumstances for a given grain method/parameter. But, we really only need/want the defensive copy to be taken if the target grain is in the same silo as the calling grain.

bill-poole avatar Sep 16 '22 12:09 bill-poole

Okay, but the defensive copy isn't needed until the target grain method is invoked, unless the goal is to protect against the message object being mutated before the Task returned from the calling grain's proxy is completed.

Incorrect, the grain might continue execution without awaiting the call (maybe it will await the task later, maybe it will be a fire-and-forget notification). It could also await the call but have some background task or interleaving request execute before the response arrives. If the value of the parameter were allowed to change between the call site and the time of execution, that would be a surprise, especially if that only occurred in some circumstances (when the target just happens to be local, versus remote). At this point, it's worth noting that there is a difference between a regular async method and a grain method. A regular async method will begin execution immediately and will execute synchronously up until the first await point. A grain method does not begin immediately, so the potential argument that this is different from regular calling semantics and so a defensive copy shouldn't be taken doesn't apply: in the regular async case, the target method is guaranteed to see the value of the parameter as it was at the call site (assuming there is no parallelism, only potentially concurrency), but this is not necessarily true for a grain call, since they are temporally disconnected: grains execute on their own time/scheduler, so calls are enqueued and not synchronously invoked.

Orleans chooses safety by default and gives you tools to drop the guard rails when you're sure they're not necessary. I've seen people get it wrong enough times to reaffirm my belief that safety is the right default.

Have you been profiling code and seeing that performance is insufficient because of copying, or is the concern speculative?

ReubenBond avatar Sep 16 '22 18:09 ReubenBond

the grain might continue execution without awaiting the call

That's what I meant when I said "unless the goal is to protect against the message object being mutated before the Task returned from the calling grain's proxy is completed." Okay, so it sounds like that is indeed the goal.

The point I was making was that the JsonSerializer.SerializeAsync method doesn't make a defensive copy of the object passed in case the Task it returns isn't awaited and the object passed is mutated while it is being serialized. i.e., Microsoft chose a different default with JsonSerializer.SerializeAsync than Orleans chose with its serialization/deserialization. But perhaps that's caused Microsoft a lot of support questions/issues?

My concern with the overhead of copying is speculative. But based on my calculations, message serialization/deserialization will consume a substantial amount of the CPU budget, so I need that to be as efficient as possible. The nature of the distributed algorithm is such that it involves a lot of message passing to/from grains so that the work can be distributed over a large number of machines. Note also that the message copying also adds GC pressure, which further consumes CPU budget.

I'd like to avoid the copy overhead if possible because for my scenario it isn't necessary. I would also like to recycle message objects to reduce GC pressure. This would achieve the highest possible performance for my scenario. However, this is not possible with Orleans today.

The CPU overhead from increased GC pressure from not recycling messages will likely be less than the overhead from making defensive copies of messages, so I'll likely opt for marking messages immutable and not recycling them.

But it would be nice to be able to have the option to squeeze out that extra bit of performance. Would you be open to disabling defensive copies for messaging between silos a configuration option that is disabled by default?

bill-poole avatar Sep 16 '22 19:09 bill-poole

The serializer in Orleans 4.x is fairly well optimized. If you're interested, watch this presentation I gave about some of the considerations taken when designing/building it and the accompanying RPC system: https://www.youtube.com/watch?v=kgRag4E6b4c

I would also like to recycle message objects to reduce GC pressure.

I assume you mean Orleans' internal message objects and not application-level message objects. Is that right? Either way, it's feasible to improve this.

The CPU overhead from increased GC pressure from not recycling messages will likely be less than the overhead from making defensive copies of messages,

Why not opt for both recycled and immutable? I believe this is within reach. It's likely something we'd consider after shipping 4.0, though (Nov).

Would you be open to disabling defensive copies for messaging between silos a configuration option that is disabled by default?

That kind of thing would be complex: some logic might rely on having a defensive copy, so I'm generally against that. Doing it on a per-RPC-interface/per-type basis is likely acceptable, though, or even per-assembly (that's a little more dangerous, less visible).

ReubenBond avatar Sep 16 '22 19:09 ReubenBond

At this point, it's worth noting that there is a difference between a regular async method and a grain method. A regular async method will begin execution immediately and will execute synchronously up until the first await point. A grain method does not begin immediately, so the potential argument that this is different from regular calling semantics and so a defensive copy shouldn't be taken doesn't apply: in the regular async case, the target method is guaranteed to see the value of the parameter as it was at the call site (assuming there is no parallelism, only potentially concurrency), but this is not necessarily true for a grain call, since they are temporally disconnected: grains execute on their own time/scheduler, so calls are enqueued and not synchronously invoked.

I assume that the defensive copy is taken synchronously with the call site, otherwise the caller could mutate the message object before/while it is copied. Is that correct?

The serializer in Orleans 4.x is fairly well optimized. If you're interested, watch this presentation I gave about some of the considerations taken when designing/building it and the accompanying RPC system

Thanks, will do! However, even with a highly optimized serializer, the defensive copy is still an unnecessary overhead for my scenario I'd like to avoid.

I just realized that I can recycle the message object once the Task returned from the grain proxy method completes - even if I mark the object as [Immutable] (thus avoiding the defensive copy). That is, marking the message class as immutable was making me think I couldn't alter the message objects at any point after they have been sent - which would mean I couldn't recycle message objects through an object pool.

That being said, that's actually what the Orleans documentation states: "that neither the provider of the value nor the recipient of the value will modify it in the future", which implies ever at any point in the future. Maybe the documentation should be updated to say that immutable message objects can be modified once the target grain method completes, assuming that's correct? Or maybe that isn't actually correct for [OneWay] methods...

I guess my point is that it is difficult to determine from the Orleans documentation if/when it is okay to mutate a message object that has been marked immutable - which is necessary for message objects to be recycled.

If Orleans could be configured to take a defensive copy only when invoking a method of a grain in the same silo, then we wouldn't need to mark our message objects as immutable to avoid the defensive copy; and therefore we wouldn't run any risks that we were breaking something by mutating a message object marked immutable by recycling it. We should be safe because:

  • we don't update a message object until the Task returned from a grain proxy method is in a completed state;
  • message objects are either copied (if sent to a grain in the same silo) or serialized/deserialized (if sent to a grain in a different silo), which makes a copy; and
  • the message object instance passed to the grain method by the sending grain is not used/read by Orleans or the receiving grain at any point after the Task returned from the grain method is in a completed state.

I assume you mean Orleans' internal message objects and not application-level message objects. Is that right? Either way, it's feasible to improve this.

I mean message objects that are passed to grain methods. Our message objects are only constructed at the point they are sent and are converted into instances of other types immediately upon receipt.

Therefore, our Orleans-serializable objects are all very short-lived and temporary. Given the substantial amount of message passing, I expect these temporary Orleans-serializable objects to increase GC pressure materially. Therefore, I'd like to recycle them through an object pool.

Note that I would imagine this thinking would extend to surrogate objects created by Orleans - i.e., it might make sense for Orleans to recycle surrogate objects because I assume they are only ever created/used internally within Orleans and exist only very temporarily.

Note also that I can only really recycle messages on the sending side because Orleans doesn't provide an option to configure a factory method to instantiate message object instances, which means Orleans cannot be configured to draw message objects from the object pool on the receiving side.

Another option would be for me to take complete control of the message serialization/deserialization and define a set of message classes that are thin wrappers around byte arrays that contain serialized messages. i.e., the calling grain would first serialize, then invoke the grain method. The receiving grain method would deserialize the message as its first order of business.

Doing it on a per-RPC-interface/per-type basis is likely acceptable

I think this would be a good feature. I imagine the configuration switch would specify a defensive copy behavior as either "always" or "local grains only", configured for specific target grain types, with the default being "always"?

bill-poole avatar Sep 16 '22 20:09 bill-poole

I assume that the defensive copy is taken synchronously with the call site, otherwise the caller could mutate the message object before/while it is copied. Is that correct?

Correct.

That being said, that's actually what the Orleans documentation states: "that neither the provider of the value nor the recipient of the value will modify it in the future", which implies ever at any point in the future

It's easier to specify the contract this way, otherwise it's more nuanced. It's domain specific, which may work in your case, but the documentation is safe advice for typical users. If you are sure you're not referencing an object after a call completes, then do as you see fit with it.

If Orleans could be configured to take a defensive copy only when invoking a method of a grain in the same silo, then we wouldn't need to mark our message objects as immutable to avoid the defensive copy; and therefore we wouldn't run any risks that we were breaking something by mutating a message object marked immutable by recycling it.

We've been over the complications with this. It would be nice to do the minimum work possible in all cases. In future it might be feasible to implement it as an optimization. For the time being, you have the controls on a per-type and per-method basis that you can use. It's also speculation that this will be problematic.

Note also that I can only really recycle messages on the sending side because Orleans doesn't provide an option to configure a factory method to instantiate message object instances, which means Orleans cannot be configured to draw message objects from the object pool on the receiving side.

You have pretty much full control, don't you? You should be able to implement pooling and avoid allocation. The new serialization framework is undocumented, but it was designed with zero-alloc RPC in mind (see the vid).

ReubenBond avatar Sep 16 '22 22:09 ReubenBond

It's also speculation that this will be problematic.

Some additional context I can provide at this stage - some grains under maximum load will receive about 50,000 calls per second with 5 kB messages being exchanged. So for those grains, that would be 250 MB per second of defensive copies being made per grain.

Note that Protobuf serialization/deserialization time has been measured at about 16 us, which means messages can be serialized/deserialized at 62,500 messages per second (using two vCPUs across two different silos - one for serialization and one for deserialization), which is 312.5 MB/s per grain. And the work being done by each grain is done on one vCPU and takes about 20 us per message, so that's handling 50,000 messages per second.

The grain code doing the work is (and should be) the bottleneck, so that means 50,000 calls per second, which means 250 MB/s serialization/deserialization per grain.

If you are sure you're not referencing an object after a call completes, then do as you see fit with it.

Okay great. Given this, I can mark message objects as immutable to avoid the defensive copy and mutate them after grain method calls complete.

In future it might be feasible to implement it as an optimization.

I think that would be a great addition. Would you like me to open a new issue for this feature request?

You have pretty much full control, don't you? You should be able to implement pooling and avoid allocation.

How can I recycle message objects on the receiving side, since Orleans instantiates them? Object pooling is only effective if new instances are drawn from the pool.

bill-poole avatar Sep 17 '22 05:09 bill-poole

Would you like me to open a new issue for this feature request?

I don't think we need to - it will naturally be considered during perf optimization cycles & issues aren't used to prioritize perf work, profiling is.

How can I recycle message objects on the receiving side, since Orleans instantiates them?

Orleans allows you to choose how your objects are constructed by letting you register an IActivator<T> via [RegisterActivator] and placing the [UseActivator] attribute on the type to tell it that it shouldn't create it via new. The activator is constructed using DI (ActivatorUtilities.GetServiceOrCreateInstance), so you can reference pools/options/etc from there.

ReubenBond avatar Oct 17 '22 18:10 ReubenBond