redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Rate limiting per client groups

Open ZeDRoman opened this issue 2 years ago • 2 comments

Adding rate limiting per client group. Now it is possible to unite clients into groups so clients under one group will have common rate quota. Group name is any string. Client is part of the group if its client_id prefix is equal to group name. For Groups we create separate rate limiters. Before we were creating limiter for every client

I have tested it manually in ducktape. But I didn't find any approach to implement stable ducktape test because we have separate quotas on each shard, we can't calculate delays for requests (when exceeds quota)

Potential problem: Every shard has its own quota

Backports Required

  • [x] none - not a bug fix
  • [ ] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [ ] v22.3.x
  • [ ] v22.2.x
  • [ ] v22.1.x

Release Notes

Features

Clients can be united in one group in order to have common rate quota.

ZeDRoman avatar Nov 21 '22 12:11 ZeDRoman

Do we need some test?

VadimPlh avatar Nov 21 '22 14:11 VadimPlh

I see that groups are defined as a prefix to the client I'd string. I was wondering if there was a better way to do this. I know that with the flex additions to the Kafka protocol metadata can be shipped along with any Kafka struct, they are called tags.

There is support for tags within the request header too, if we could ensure that clients send the group ID within the tags metadata struct then we could avoid parsing the client id for a group id altogether.

https://github.com/redpanda-data/redpanda/blob/dev/src/v/kafka/server/protocol_utils.cc#L89

The only negative of this is that all clients must be making requests at supported APIs that are new enough to support flex.

graphcareful avatar Nov 21 '22 20:11 graphcareful

discuss: Is the client group TP limiting going to be applied to the response/fetch traffic?

dlex avatar Dec 22 '22 20:12 dlex

/ci-repeat 10 skip-units dt-repeat=100 tests/rptest/tests/cluster_quota_test.py::ClusterRateQuotaTest

dotnwat avatar Dec 23 '22 00:12 dotnwat

/ci-repeat 10 skip-units dt-repeat=100 tests/rptest/tests/cluster_quota_test.py::ClusterRateQuotaTest

ZeDRoman avatar Dec 23 '22 16:12 ZeDRoman

Failure was k8s

dotnwat avatar Dec 24 '22 23:12 dotnwat