envoy
envoy copied to clipboard
local rate limit: add cross local cluster rate limit support
Commit Message: local rate limit: add cross local cluster rate limit support Additional Description:
Envoy provides lots of rate limit related filters/features. The global rate limit provides the most powerful feature. But it also introduce additional dependencies (rate limit server, redis, etc) and latency (additional calling to rate limit server).
The local rate limit is more stable and has better performance, and has no dependency to external server. But the local rate limit is work at single instance or connection scope. That means that if local rate limit is used, we cannot get a stable total limit for a Envoy cluster. Because the total limit of local rate limit filter is single instance limit multiply the instance number of Envoy. But the instance number may changed when the cluster scaling.
This PR add a new interesting feature. It make the local rate limit filter could aware the membership of the local cluster (the cluster contains current Envoy self). That means the local rate limit could compute it's tokens based on the membership of local cluster and achieve the target to share the total limit between multiple Envoy instances (a Envoy cluster).
See #34230 for more discussion.
Risk Level: low. (nothing will be changed if we don't enable the feature explicitly) Testing: unit. Docs Changes: n/a. Release Notes: n/a. Platform Specific Features: n/a.
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).
Can we have Tianyu take a pass and then I will send to a maintainer?
Seems @markdroth is busy recently, re-assign this to @adisuissa for API review.
/retest
@adisuissa looks like this is ready for review - the main merge is just for review notes
One other idea: will it be possible to redesign this so that each instance owns a reference to the endpointStats() object or the membership_total object of the static local cluster, and when tokensPerFill is called, it will fetch the current number of endpoints?
It's my initial implementation when doing the POC. It's actually simpler but means it's hard to use different algorithms to calculate the ratio. For example, we may want take the weight into the account in the future. Current design provides a well defined interface and abstraction for more complex share calculating.
/retest