redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Throttle topic administrative functions

Open graphcareful opened this issue 2 years ago • 11 comments

This PR implements KIP-599 Throttle topic administrative functions, namely they are, CreateTopics, DeleteTopics and CreatePartitions requests.

Importantly to note, this KIP doesn't attempt to constrain the number of global partition mutations, cluster wide. The KIP enforces the quota at the client level. We could have deviated from the KIP in this respect, but there is already ongoing work and planning within the multi-tenancy effort to enforce such limits, so we thought it was best to implement the KIP as it originally stands.

Throttling works by using a modified token-bucket algorithm where the tokens are representative to the number of partition mutations made by one of the three aforementioned requests. The algorithm is modified to accept a given burst which allows a burst of several mutations with a small or medium number of partitions within the same request to proceed. Subsequent requests will observe an exceeded quota and will return a newly implemented error code THROTTLING_QUOTA_EXCEEDED back to the client (if the client is new enough) along with the total throttle ms time in the response.

Fixes: #3694

Changes in force-push

  • Implemented Bens' feedback, addresses mostly C++ issues

Changes in force-push

  • Removed uncessary cluster error code
  • Fixed a bug with the handling of throttling_quota_exceeded errors, via the KIPs explaination. With these changes, the 3 supported requests will now respond with throttling_quota_exceeded for supported requests AND where the version of the request is new enough. In cases where the client would not understand this error code, the request is handed with the throttle applied.

Changes in force-push && force-push

  • Rebased dev

Changes in force-push

  • Modified how throttle_quota_excceeded is interpreted. Previously all partition mutations were accounted per request and the entire allotment of mutations was allowed to proceed.
  • Now each request to mutation within a single create_topics, delete_topics or create_partitions request is evaluated one at a time, all requests that exceed the quota are rejected.
  • The modified behavior means that if a single request contains X partition mutations some number < X may be allowed to proceed while the remaining will be returned to the client with retryable error

Changes in force-push

  • Fixed a bug with flex protocol parsing of types like []string

Release Notes

  • Adds support for throttling of topic administration functions (create_topics, create_partitions, and delete_topics requests), by way of a new config property partition_mutation_rate. When this rate is exceeded per user/shard, the client will be throttled appropriately.

graphcareful avatar Apr 18 '22 23:04 graphcareful

Missing release notes section in PR description (should describe the feature and specify the new config property name)

jcsp avatar Apr 19 '22 11:04 jcsp

Importantly to note, this KIP doesn't attempt to constrain the number of global partition mutations, cluster wide. The KIP enforces the quota at the client level.

The KIP enforces on a per-node+per-client basis, right? Whereas this PR is enforcing on a per-shard+per-client basis, which is quite a bit looser. From a superficial reading, I think this only protects us from aggressive clients if the client is using a single connection: if they opened a bunch of connections (that might hit different shards), the per-client limits are less meaningful.

Per-shard enforcement of quotas is fine if the resource we're protecting is also per-shard (cpu time, memory), but in this instance if we're trying to protect our central controller log, I'm not sure this is going to give us as much confidence as we would like. Maybe the enforcement should be done on the controller leader instead? Or perhaps even in both places: a "friendly" check at the kafka level that does this sleep-based throttling, plus a "hard" check on the controller leader that would actually send the quota_exceeded errors if the controller log is being written to too fast.

jcsp avatar Apr 19 '22 11:04 jcsp

Importantly to note, this KIP doesn't attempt to constrain the number of global partition mutations, cluster wide. The KIP enforces the quota at the client level.

The KIP enforces on a per-node+per-client basis, right? Whereas this PR is enforcing on a per-shard+per-client basis, which is quite a bit looser. From a superficial reading, I think this only protects us from aggressive clients if the client is using a single connection: if they opened a bunch of connections (that might hit different shards), the per-client limits are less meaningful.

Per-shard enforcement of quotas is fine if the resource we're protecting is also per-shard (cpu time, memory), but in this instance if we're trying to protect our central controller log, I'm not sure this is going to give us as much confidence as we would like. Maybe the enforcement should be done on the controller leader instead? Or perhaps even in both places: a "friendly" check at the kafka level that does this sleep-based throttling, plus a "hard" check on the controller leader that would actually send the quota_exceeded errors if the controller log is being written to too fast.

Good points @jcsp , true that a client could jump around to other shards to end up exceeding the quota, but they could do that now anyway to sidestep the byte level (and soon to be implemented time based) quotas. Agree with your points though.

Doing this enforcement on the leader was considered however it would be impossible to return the estimated calculated throttle response time without heavily modifying the API, which is one reason i went this route. Maybe @dotnwat can chime in here but I think the idea was to have the multi-tenancy effort really provide hard controller partition create rate quotas and KIP-599 would be our effort at providing some throttling up until we have a better solution implemented.

graphcareful avatar Apr 19 '22 16:04 graphcareful

~~Going to draft this as i'm realizing it will really depend on KIP-482 to implement correctly. This is due to the unfortunate fact that the version bumps required in the requests that KIP-599 designates may return the new throttling_quota_exceeded error is greater then the introduction of the flexibleVersion.~~

~~We cannot support any request that has a version greater then the flexibleVersion. Simply attempting to would probably crash as even the parsing of the request handler will differentiate from versions older then the flexibleVersion.~~

This work has been completed and exists in dev as of this update

graphcareful avatar Apr 21 '22 01:04 graphcareful

Rebased dev

graphcareful avatar Aug 18 '22 18:08 graphcareful

Doing this enforcement on the leader was considered however it would be impossible to return the estimated calculated throttle response time without heavily modifying the API, which is one reason i went this route.

why is it impossible? if enforcement is done at the controller leader then it has some idea about the rate of requests.

dotnwat avatar Aug 24 '22 23:08 dotnwat

Doing this enforcement on the leader was considered however it would be impossible to return the estimated calculated throttle response time without heavily modifying the API, which is one reason i went this route.

why is it impossible? if enforcement is done at the controller leader then it has some idea about the rate of requests.

That comment is pretty old so I don't even remember what I was referring to, but doesn't it already work this way? Since create_topics , delete_topics and create_partitions all are executed on the controller leader, the current running totals are representative of the cluster-wide partition mutation rate.

I think that gets thrown out the window though when the controller changes, since that running total is in-memory data only stored on a single node.

graphcareful avatar Aug 25 '22 00:08 graphcareful

@graphcareful @dotnwat @jcsp @mmaslankaprv @ZeDRoman how does this relate to https://github.com/redpanda-data/core-internal/issues/37?

~Overlapping? Identical? Conflicting? How much coordination do we need here?~ switching to slack for discussion.

mmedenjak avatar Sep 07 '22 09:09 mmedenjak

@dotnwat in the KIP-599 documentation, its called controller_mutations_rate*, FWIW I'm up for changing it , i'm not married to the name I've chosen

graphcareful avatar Sep 20 '22 15:09 graphcareful

@dotnwat in the KIP-599 documentation, its called controller_mutations_rate*, FWIW I'm up for changing it , i'm not married to the name I've chosen

@graphcareful me neither. can you sync with @ZeDRoman about config names? i think that name is fine but maybe it ends up being name that differentiates control at the cotnroller vs control at the edge?

dotnwat avatar Sep 21 '22 02:09 dotnwat

Restarted CI

dotnwat avatar Sep 21 '22 03:09 dotnwat

In previous force push, I addressed concerns about lifetime of references on the stack within particular lambdas, and i moved the token bucket algorithm into the kafka/ tree to not confuse others with the generic one Roman is working on that will exist in utils/

graphcareful avatar Sep 28 '22 12:09 graphcareful