envoy Cross instances local rate limit filter

Title: Cross instances local rate limit filter

Description:

Local rate limit works more stable and has no additional dependency. It basically is our first choice to do rate limiting.

The only shortage of local rate limit is that the token bucket configuration works independently in different Envoy instances.

This means the replica number of Envoy will effect the final throughout of limiter.

It's not friendly for users who don't know the technology details. And the replica number may be changed dynamically.

But the Envoy actually could know the replica number of it self, by the local cluster.

So, I think it's possible to let all Envoy instances to share a token bucket. Every instance will be pre-allocated part of the bucket quantitative by specific algorithm. (for example, even allocation.) And when the membership of local cluster is changed, we re-execute the algorithm again.

[optional Relevant Links:]

Any extra documentation required to understand the issue.

May 17 '24 14:05 wbpcode

I'm assuming this doesn't need anyone pinged for triage since wbpcode filed it and is the person I would ping. :)

May 17 '24 14:05 ravenblackx

A possible path to reach this target:

To extend the cluster manager to expose additional method to accept a callback to watch membership change of local cluster.
To create a singleton (managed by the singleton manager of server context) on need to watch the membership change of local cluster and calculate the token share of current Envoy instance.
To use the token share when the rate limit filter refilling the token bucket.

If no local cluster is provided, the token share will be 1.0 forever and will not change anything. When all the local limit filter is disabled or not used, the singleton will be destroyed automatically and unregister the watching callback.

May 18 '24 03:05 wbpcode

The first question I need to do is "what is the local cluster and how can be used to know the number of replicas?". The number of replicas is the only info can be retrieved or there could be other info sheared between replica? Probably this info could be useful for other filter or custom wasm filter, right?

May 18 '24 15:05 juanmolle

local rate limit filter is intended to protect a single instance of a service i.e. how much an instance of envoy can process. I think constantly changing limit based on membership count (during HPA or any scale up/scale down operations) would be very confusing and hard to reason about. Curious why don't you use global rate limiting if you need such a behaviour?

May 20 '24 14:05 ramaraochavali

cc @ramaraochavali Global rate limiting introduce additional dependency (rate limit server, redis) and latency, and may not work properly if it's overload.

We also use the local limit in the gateway mode where it's hard to say the local limit is used to protect only one instance of service. And we only enable it for users who know it and require it. So, I believe it won't confuse anyone.

I think constantly changing limit based on membership count (during HPA or any scale up/scale down operations) would be very confusing and hard to reason about.

From the other side, I think it's also confusing for users who want a total limit (like in gateway mode) but the total limit will be changed because the HPA or any scale up/down operations. This new feature will provide a new option to let the local rate limit work with a stable total limit of whole Envoy cluster/service.

May 21 '24 09:05 wbpcode

what is the local cluster and how can be used to know the number of replicas?

local cluster is a special cluster that contains the Envoy instance self. See local cluster name in the https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#envoy-v3-api-msg-config-bootstrap-v3-clustermanager

I have prepared a PR. You can check it if you are interested in that.

May 21 '24 09:05 wbpcode

From the other side, I think it's also confusing for users who want a total limit (like in gateway mode) but the total limit will be changed because the HPA or any scale up/down operations.

Are you saying the total limit would be changed during HPA by operators based on the number of nodes configured for gateway?

May 21 '24 10:05 ramaraochavali

@ramaraochavali I mean if local rate limit is used, for example, 100 tokens per second is configured, the total limit is 100 * number of Envoy instances.

But the instances of Envoy will be changed at runtime because the HPA or something. So, the total limit will also be changed. But in the gateway mode, the users will expect a stable total limit in most cases regardless of the number of the Envoy instances.

May 21 '24 10:05 wbpcode

But in the gateway mode, the users will expect a stable total limit in most cases regardless of the number of the Envoy instances.

I see. So when a new node comes/node goes down, the current envoy instance limit may go up/down causing few inflight requests to fail because there is another node in the cluster which otherwise would have passed if membership did not change. We have always used local rate limit as a service protection mechanism per envoy instance so trying to understand more about the use case

May 21 '24 10:05 ramaraochavali

I would also be very interested, we are currently exploring ways to implement a suitable approach to rate limiting that is aware of the number of envoy instances but reduces the number of extra dependencies and also calls that are being made during request processing. Global Rate Limiting would have to call out to the rate limiting service which in turn calls out to some redis or memcached (in the reference implementation at least) which has the potential to increase request latency a lot.

Implementing a shared local rate limiting approach would be a good approach here.

As I understand, a shared token bucket would (/could) also mean that tokens can be used from a different instance in the local cluster? That would mitigate the problem @ramaraochavali mentioned where scaling during requests could fail requests that would otherwise have passed.

May 21 '24 11:05 ldb

As I understand, a shared token bucket would (/could) also mean that tokens can be used from a different instance in the local cluster? That would mitigate the problem @ramaraochavali mentioned where scaling during requests could fail requests that would otherwise have passed.

Nope. Envoy cannot actually share data or message with other instances. We can only compute a share/pecentage base on the membership and apply the share to the token buckets.

May 21 '24 12:05 wbpcode

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

Jul 03 '24 12:07 github-actions[bot]

Currently, global rate limiting allows for individual rate limit thresholds for each value of specific key. Can local rate limiting achieve this as well?

From the doc of https://github.com/envoyproxy/ratelimit, based on the following policy, different user can have individual rate limit bucket.

domain: ratelimit
descriptors:
  - key: x-user-name
    rate_limit:
      unit: second
      requests_per_unit: 100

Aug 14 '24 10:08 SpecialYang

envoy envoy copied to clipboard

Cross instances local rate limit filter

envoy
envoy copied to clipboard