restate
restate copied to clipboard
Introduce per service+partition concurrency limits
we currently have an invoker concurrency limit that is used to defend the restate-server. we need a concurrency limit to defend a target service (or maybe an endpoint) . While supporting a global concurrency limit is a bit more challenging, i suggest introducing a per-partition+target limit. Users can do their capacity planning accordingly, or re-route strict request to a specific key (hence pinning to a partition)
This might be easily served if we do #2432
After an offline conversation, we discussed the following 3 situations:
- Protecting the runtime from overload/OOM. For this purpose we can use the current invoker concurrency limit, being it per partition, and this should be enough. We can also employ additional strategies such as this one: https://github.com/restatedev/restate/issues/2761
- Protecting the service deployments/endpoints from the flood of invocations generated by the runtime. For this purpose, we can have a tunable per service deployment, that is implemented by the invoker and behaves exactly like the invoker concurrency limit, but on a service deployment basis. This limit would again be per partition, so the effective limit is the configured user value * num partitions (we can play on how to let the user best configure this value).
- Granularly define a concurrency limit for
Servicehandlers, or virtual object/workflow shared handlers. This is a semantic feature, that goes in the partition processor, and connects to the thread of concurrency limits.