redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Supporting request rate quotas - KIP-124

Open graphcareful opened this issue 2 years ago • 19 comments

KIP-124 is an accepted proposal to add request rate quotas to Kafka. It works by adding a new field throttle_ms_time to most responses. This field is 0 unless a previous request was throttled. Thresholds for throttling are configured within the redpanda config file. If a request exceeds a certain amount of bytes or time slice within the configurable window, the throttle will occur.

The behavior to throttle requests based on request size has already been implemented. This PR adds a little more logic to track the total duration of the most recent request, if the total time exceeds the maximum window size (and no delay was induced due to average size limit breaches) then the new throttle delay will be induced.

Changes in force-push

  • Adding logic to actually throttle if time quotas have been exceeded

Changes in force-push

  • Fixed use after free bug introduced in previous force push

Changes in force-push

  • Added new commit with unit test
  • Fixed bug with incorrect calculation of delay

Changes in force-push

  • Rebased dev to fix a conflict with kafka/server/tests/CMakeLists.txt where the same line was modified when adding a new test source file

Changes in force-push

  • Changed throttle api to return a std::chrono::milliseconds so callers no longer need to wrap value to cast to ms
  • Beefed up unit test to test edge cases and randomized data set

Changes in force-push

  • Use tmp to fill throttle_response_ms field instead of having to manually assign the member var everywhere.

Changes in force-push

  • Take the larger of the two quotas (bandwith / time) if both have been exceeded
  • Fix error with application of time interval, was not applied after response was completed

Changes in force-push

  • Change how request rates are calculated. Now the rate_tracker class is used for this as the KIP describes rates should be calculated. Previously only 1 logical time window was ever used.
  • Modified the end of the time measurement interval for a request to be after all request stages have finished processing. Previously the measurement was taken after the first stage was completed.
  • The introduction of the point above meant that referencing quota_manager within the connection_context's background loop needed to be done. To avoid crashes during shutdown (reference of already deallocated service) a change was made to modify the shutdown order of kafka_server

Changes in force-push

  • Rebase dev, conflict in redpanda/application.cc

Changes in force-push

  • Fixed a bug introduced in a previous change that modified the shutdown order.
  • Implemented Noahs feedback to use concepts instead of SFINAE w/ classical templates

Changes in force-push

  • Used an optional property to enable switch for feature
  • Added a unit test

Changes in force-push

  • Rebase dev
  • Make this throttling disabled by default
  • Max throttle by time limited by one window width (default 1s)

Changes in force-push

  • Fixed bug where throttle would always be applied when feature is enabled
  • Introduced ducktape test

graphcareful avatar Mar 11 '22 15:03 graphcareful

Any feedback on writing a useful non-flaky test would be appreciated

graphcareful avatar Mar 11 '22 15:03 graphcareful

Any feedback on writing a useful non-flaky test would be appreciated

Same question has come up recently for other things: manual_clock is probably the best way to write a comprehensive test for a rate limiting component, I recently used it here https://github.com/redpanda-data/redpanda/pull/3982 and there are a couple tests in the seastar tree that use it too. It does require pulling out the rate limiting into some component that is testable independently.

On top of that, I think a ducktape integration test is achievable as long as the ratios between throttled and non-throttled are chosen to create big differences that should be resistant to background noise. For example, a test that compares the same produce workload run with 5 req/s limit, then with 20 req/s limit, run it for 500 messages, the runtime of the 5 req/s should be around 4x more than that of the other run, so the test can have a success condition of something like >3.5x to allow for a little bit of noise.

jcsp avatar Mar 11 '22 15:03 jcsp

I am +1 to use manual_clock in units tests.

mmaslankaprv avatar Mar 15 '22 13:03 mmaslankaprv

A side note on why I didn't use manual_clock in the test. Because the APIs for the rate class take a timestamp as parameters, therefore i can pass any time desired.

graphcareful avatar Mar 15 '22 14:03 graphcareful

  • how does throttling work for produce? it seems like we aren't doing that, and upstream kafka has some special cases.

Looks like I forgot to include that since it wasn't in the KIP, the KIP only talks about how the protocol was modified and Produce/Fetch support that field from version 0. As for a special case I see this Produce/Fetch requests will return the total throttling time reflecting both bandwidth and utilization based throttling in the response , maybe this could be what you saw? Could you link me to the source? Also reading more I found some additional edge cases for fetch Fetch requests used for replication will not be throttled based on request times since it is possible to configure replica.fetch.wait.max.ms and use the existing replication byte rate quotas to limit replication rate. I did a quick grep for replica_fetch and didn't find it in the codebase, do we honor this setting?

  • can we structure things so it is more difficult to forget to set this field?

Yup using some neat c++ TMP i was able to do this, nice advice.

  • have there been significant changes to the quota calculation and strategies since kip-124 (e.g. https://cwiki.apache.org/confluence/display/KAFKA/KIP-219+-+Improve+quota+communication and https://cwiki.apache.org/confluence/display/KAFKA/KIP-257+-+Configurable+Quota+Management might be relevant, along with changes to the algorithms in upstream kafka that I've noticed but I don't think necessarily have KIPs associated)?

Interesting, maybe it would be useful to implement them in separate PRs?

  • is dynamic config enabled for the relevant settings so we could deal with errors in the calculations if they came up without rebooting?

From what i can tell yes. All of the settings that the rate class uses to determine throttle settings are configured with the needs_restart::no flag in configuration.cc

graphcareful avatar Mar 16 '22 19:03 graphcareful

Looks like I forgot to include that since it wasn't in the KIP, the KIP only talks about how the protocol was modified and Produce/Fetch support that field from version 0

This is pretty common when looking at KIPs. Things evolve after the initial work of the KIP or it's not comprehensive. For some KIPs functionality can hide behind the protocol so we have a lot of design flexibility, and sometimes its the opposite. In this case it's a little of both: clients/server cooperate w.r.t. to the throttle time. So I think in the case of throttling we need plenty of experimental results, but should also be able to jump ahead by learning from the evolution of the implementation in upstream kafka.

The entry point for all the kafka handlers is KafkaApis.scala

Interesting, maybe it would be useful to implement them in separate PRs?

Yes, but again, it might not really be separate functionality. It may just be all part of the same functionality but spread out over time through KIPs. I think this PR could probably stand on its own but then immediately jump into bringing the implementation up to parity with kafka.

From what i can tell yes. All of the settings that the rate class uses to determine throttle settings are configured with the needs_restart::no flag in configuration.cc

let's make sure to have a test that shows we can enable/disable throttling as a mitigation for if something got stuck.

dotnwat avatar Mar 16 '22 20:03 dotnwat

@dotnwat Ok taking a look at the source code now

graphcareful avatar Mar 16 '22 21:03 graphcareful

@dotnwat Ok taking a look at the source code now

cool. I think this pr is looking pretty good. let's wait to see what you think after checking out the upstream implementation if we want to merge then and follow up or wait and combine.

dotnwat avatar Mar 17 '22 05:03 dotnwat

Ok so after taking a look at what Kafka does here are some of my findings:

KIP-219 - Would involve the creation of a new way to manage 'muted' channels. KIP-257 - Would involve the creation of a new method to share quotas across users, <user-clientid's> also has the quotas be dynamic so they may be changed by the system itself.

I say both of these KIPs feel separate enough to implement in a new PR. However I did see some other special casing that leads me to believe we should implement in this PR to maintain an assumed consistent behavior (w.r.t throttling) with kafka. Maybe these exist in other KIPs? In either case they are:

Edited at a later time:

  1. If both bandwidth and request rate throttles are violated, apply the higher of the two (for both produce and fetch)
  2. Furthermore when the above event occurs if the bandwidth is the higher of the two, add the value to a Produce or Fetch quota manager. Seems like kafka manages a few different types of quotas.
  3. Request quotas aren't enforced for produce requests when acks == 0
  4. ~~Noticed this which is part of 124 but missing from this PR To ensure that these exempt requests cannot be used by clients to launch a DoS attack, these requests will be throttled on quota violation if ClusterAction authorization fails. SaslHandshake request will not be throttled when used for authentication, but will be throttled on quota violation if used at any other time. - however looks like this would apply when auth was handled differently within kafka because these requests are noops now. Still looking into if Kafka throttles auth or not~~ Since KIP-124 is pretty old a lot of this no longer applies, redpanda pretty much already applies throttling to all auth
  5. ~~There's a notion of a forwarded request, I believe maybe this is for transaction support? Maybe replication? In either case throttling is never accounted for forwarded requests.~~ Kafka uses their own Fetch API to replicate, that's why there is a distinction between fetch requests that come from a normal client, or another broker. We already have rate limiting on throttling replication via raft_learner_recover_rate.

graphcareful avatar Mar 18 '22 16:03 graphcareful

To address the comments above, I believe points 1-3 are big enough to place in another PR. It involves editing quota_manager to be request aware, and splitting up rates separately for produce & fetch

graphcareful avatar Mar 21 '22 18:03 graphcareful

Looks like the only CI failures are due to sleep_aborted exceptions:

<BadLogLines nodes=docker_n_10(1) example="ERROR 2022-03-22 22:55:17,400 [shard 0] rpc - server.cc:114 - kafka rpc protocol - Error[applying protocol] remote address: 172.18.0.19:56514 - seastar::sleep_aborted (Sleep is aborted)">

which is occurring due to shutdown happening while throttling a request. I will investigate to see if this throttling was applied correctly.

graphcareful avatar Mar 23 '22 05:03 graphcareful

Added commit which adds configuration parameter to tune and/or enable/disable feature

graphcareful avatar Mar 23 '22 21:03 graphcareful

Would it be possible to wite a test for this feature ?

mmaslankaprv avatar Apr 11 '22 06:04 mmaslankaprv

Would it be possible to wite a test for this feature ?

I've added a new unit test, however it would be nice to have a ducktape test, although i'm not sure if we could concretely control and calculate expected delay times. Ideas?

graphcareful avatar Apr 11 '22 18:04 graphcareful

I've added a new unit test, however it would be nice to have a ducktape test, although i'm not sure if we could concretely control and calculate expected delay times. Ideas?

We don't have great control, but if you restrict the test to only run in release mode (debug mode is a crapshoot), define a pretty broad range of acceptance, and have a runtime that's long enough to span short blips in performance, I think there's a decent chance of getting something robust. ConnectionRateLimitTest seems to be doing okay, it has a similar challenge. Helps to define some really contrasting parameters, like set it to 10 req/s then 1000req/s and require that the throughput is different by at least 100x.

jcsp avatar Apr 11 '22 21:04 jcsp

is this PR sequenced after the flex protocol work or should it get another review now?

dotnwat avatar Jun 01 '22 07:06 dotnwat

is this PR sequenced after the flex protocol work or should it get another review now?

It actually doesn't depend on flex at all. I was thinking about writing a ducktape test using the enhancements to ducktape made by aaron.

graphcareful avatar Jun 01 '22 18:06 graphcareful

Updated the PR with a ducktape test that demonstrates the mechanism kicking in when enabled and not implementing backpressure when disabled

graphcareful avatar Jul 11 '22 22:07 graphcareful