restate Introduce per service+partition concurrency limits

we currently have an invoker concurrency limit that is used to defend the restate-server. we need a concurrency limit to defend a target service (or maybe an endpoint) . While supporting a global concurrency limit is a bit more challenging, i suggest introducing a per-partition+target limit. Users can do their capacity planning accordingly, or re-route strict request to a specific key (hence pinning to a partition)

Feb 19 '25 21:02 igalshilman

This might be easily served if we do #2432

Feb 20 '25 08:02 slinkydeveloper

After an offline conversation, we discussed the following 3 situations:

Protecting the runtime from overload/OOM. For this purpose we can use the current invoker concurrency limit, being it per partition, and this should be enough. We can also employ additional strategies such as this one: https://github.com/restatedev/restate/issues/2761
Protecting the service deployments/endpoints from the flood of invocations generated by the runtime. For this purpose, we can have a tunable per service deployment, that is implemented by the invoker and behaves exactly like the invoker concurrency limit, but on a service deployment basis. This limit would again be per partition, so the effective limit is the configured user value * num partitions (we can play on how to let the user best configure this value).
Granularly define a concurrency limit for Service handlers, or virtual object/workflow shared handlers. This is a semantic feature, that goes in the partition processor, and connects to the thread of concurrency limits.

Feb 20 '25 11:02 slinkydeveloper