orleans icon indicating copy to clipboard operation
orleans copied to clipboard

Re-entrant grain flow control question

Open bill-poole opened this issue 1 year ago • 8 comments

Does Orleans provide flow control to re-entrant grains, or is it left up to each re-entrant grain to use a "max concurrency" rate limiter and then throw an exception if the number of concurrent requests exceeds the allowable number? Ideally, Orleans would limit/block callers until the target grain is under the configured "max concurrency".

If Orleans provides no such controls/limits, then all re-entrant grains are arguably exposed to denial-of-service (DoS) attacks, unless those grains have implemented custom rate limiting controls (like throwing an exception if the rate limiter is at capacity).

All grains should apply backpressure to calling grains and it's not clear from the Orleans documentation whether it does this for re-entrant grains. I'm aware that the SiloMessagingOptions.MaxEnqueuedRequestsHardLimit option can be used to provide flow control for non-re-entrant grains; but, it doesn't seem like this will work for re-entrant grains. Apologies if I have that wrong.

bill-poole avatar Sep 10 '22 11:09 bill-poole

It's up to the developer to manage concurrency in that case. The grain is still single threaded, so it won't necessarily starve the system if one reentrant grain receives a large influx of load.

We could consider a max concurrency option per grain. I'm not averse to the idea. It can be implemented cheaply. I wonder if there would be a need for more complexity, such as management method calls which ignore the limit, or variable call costs, etc.

As a workaround for the lack of this feature today, you can also implement one fairly easily using a semaphore or a counter (depending on desired behavior) and by implementing IIncomingGrainCallFilter on the grain class.

Does anyone else want such a feature or have additional requirements/considerations to add?

ReubenBond avatar Sep 10 '22 18:09 ReubenBond

Thanks @ReubenBond. My main concern is the host running out of memory if inbound requests are queued at the receiving grain while waiting for the rate limiter / semaphore. If system A sends messages to system B faster than system B can process them and there is no backpressure felt by system A, then system A will continue to send messages as fast as it can. All the while, these messages are being queued in memory by system B. That will always lead to system B eventually running out of memory.

I believe there should be flow control implemented at the Orleans transport layer so that sending grains feel backpressure from receiving grains. This is done in HTTP/2 and by extension, gRPC.

The only way to implement this manually within grains would be for the receiving grain to throw an exception and for sending grains to retry with an exponential back off - much like what is done with HTTP 1.1 using HTTP status code 429. I think it would be better (and safer from the standpoint of reducing exposure to DoS attacks) if backpressure were implemented as a standard feature of the Orleans transport.

One possible approach would be to use gRPC (but using native Orleans serialization - Protobuf is not required) as the native Orleans transport.

bill-poole avatar Sep 10 '22 18:09 bill-poole

all grains should apply back pressure to calling grains

Please elaborate on the scenario and the request. It sounds like you have an unbounded number of reentrant grains issuing calls with an unbounded level of concurrency to some other reentrant grains and the request is that those downstream grains should be applying back pressure to upstream callers. Do I have that right?

Edit to add: in particular, I want to know what granularity you feel back pressure should be tracked and applied on.

ReubenBond avatar Sep 10 '22 19:09 ReubenBond

My specific situation relates to Orleans cluster ingress from the Internet. JSON event messages are posted to a cluster of web servers over HTTP 1.1, and those web servers each use an Orleans client to handle each event message by invoking an Orleans grain. Each web server could be constrained with a rate limiter, but the number of web servers/hosts is variable due to autoscaling. i.e., each web server could constrain calls to each target grain to no more than say 1,000 at a time. But as the cluster scales, that 1,000 is multiplied by the number of web servers. This is because the web servers are not feeling any backpressure from the target grains.

Therefore, the web servers need to feel backpressure from the target Orleans grains. I can do this by having the grains throw an exception if they are overloaded and then have the web servers back off exponentially. Each web server is configured with its own local rate limiter. If a single web server exceeds say 1,000 concurrent requests (which could be exceeded if several requests are waiting to retry calling their respective target grains), then the web server returns a 429 response - which is the correct/appropriate way to throttle HTTP 1.1 requests.

I believe the need for backpressure to be felt by callers is a universal requirement - irrespective of whether the caller is an Orleans grain or not. It is always possible that a calling grain can/will call a grain faster than that grain can process those requests. There always should be backpressure felt by calling/sending grains/clients so they limit their rate to be commensurate with the receiving grain's maximum processing capacity - irrespective of whether the receiving grain is re-entrant or not.

I want to know what granularity you feel back pressure should be tracked and applied on.

I'm not sure I understand the question, but ideally backpressure would be felt at grain granularity (and message-by-message) so one grain at capacity doesn't stop/slow messages being dispatched to other grains on the same host. I believe this is the way HTTP/2 flow control works if a separate HTTP/2 stream were to be dedicated to communicating with each grain. As I understand it, flow control in HTTP/2 is at the stream level.

Per-grain flow control (like per-stream flow control in HTTP/2) would ensure that senders send messages to target grains at the rate at which those target grains consume those messages.

bill-poole avatar Sep 10 '22 19:09 bill-poole

EDIT: I wrote this before seeing your above comment (I'm on my phone, with no PC in sight).

gRPC essentially limits outstanding requests to 100 per connection and queues on the client when that limit is reached: https://docs.microsoft.com/en-us/aspnet/core/grpc/performance?view=aspnetcore-6.0

For that to work in cases where there's a load balancer in front of an elastically scalable system, the LB would need to dynamically adjust the concurrency limit based on where the individual requests are forwarded. Tracking that kind of state is cheap enough for a small number of clients and backend servers, but it's likely not worthwhile when the number of clients and servers are arbitrarily large (eg, individual grains as clients and servers) because of the memory requirement. So if we were to do that in Orleans, we'd likely want to track at the process level rather than per grain. There would likely be some coordination cost associated with that, especially since requests can be forwarded between servers as grains move around or activation races occur. We would need to account for that.

A non-mutually-exclusive alternative is to limit the maximum number of concurrent requests which a grain will process (either reentrant or regular grain with interleaving calls) before enqueuing additional requests. That would allow the existing queue length limit to apply, hence limiting the total number of inbound incomplete requests per grain and hence memory per grain. The back pressure on the request queue results in an exception if there are too many callers sending concurrent requests but the concurrency limit on processing acts as a limit to the amount of work produced by a given grain concurrently, assuming that calls are being awaited.

There are other concerns, though. For example, if requests form a chain/tree rather than being flat (i.e, to process one initial request, one or more subsequent requests must be processed) then those subsequent requests may be queued behind new initial requests and that can lead to poor performance or a deadlock. For this reason, I believe that it's best to use admission control (limiting the number of concurrent initial requests) rather than limiting requests indiscriminately. There are a few entry points for initial requests in Orleans: external callers, reminders, and streams. All of these should have some form of admission control applied. Currently, we only apply admission control at the gateway level, so there's some work to do there.

ReubenBond avatar Sep 10 '22 20:09 ReubenBond

Thanks @ReubenBond for such a detailed reply! I think applying flow control at the host/process level would be fine for most situations - it would be fine for my scenario. Note that while gRPC limits to 100 streams per connection, it's always possible to open more connections if greater concurrency is required.

Whatever flow control mechanism is employed, I believe that it should be handled by the Orleans runtime, invisible to the grain logic; and, it needs to prevent messages from being read into and held in an in-memory queue at the receiver. And it needs to apply to all types of calls/grains (standard and re-entrant/interleaved) from all sources (grains and external/clients).

bill-poole avatar Sep 10 '22 20:09 bill-poole

We've moved this issue to the Backlog. This means that it is not going to be worked on for the coming release. We review items in the backlog at the end of each milestone/release and depending on the team's priority we may reconsider this issue for the following milestone.

msftbot[bot] avatar Sep 15 '22 18:09 msftbot[bot]

@ReubenBond yes but for reminders :)

ElanHasson avatar Sep 17 '22 00:09 ElanHasson