kazoo icon indicating copy to clipboard operation
kazoo copied to clipboard

feat(core) implement client request rate limiting

Open ceache opened this issue 2 years ago • 9 comments

Fixes #664,

Why is this needed?

This provides a top level control to limit the load a given client can generate on an ensemble in the form of number of requests. In situation with large number of watches, the number of request emitted by the client is otherwise unbounded.

Proposed Changes

  • Introduce a "semaphore_impl" attribute on the various handlers.
  • Allow a new max_async_requests argument to the client constructor.
  • Change the client to bound the number of outstanding async requests with a semaphore limited to max_async_requests.

Does this PR introduce any breaking change?

No, it is backward compatible with a default of no limit.

ceache avatar Nov 12 '22 22:11 ceache

@python-zk/maintainers This is a first draft. Let me know if you like the direction and I'll add tests, etc.

Full disclosure, we have had a version of this patch in production for a while (with both gevent and threading handlers). I have not tested the eventlet handler directly.

ceache avatar Nov 12 '22 23:11 ceache

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (55f27b2) 96.76% compared to head (df7c611) 96.75%. Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #683      +/-   ##
==========================================
- Coverage   96.76%   96.75%   -0.02%     
==========================================
  Files          27       27              
  Lines        3557     3570      +13     
==========================================
+ Hits         3442     3454      +12     
- Misses        115      116       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Nov 12 '22 23:11 codecov-commenter

I think it is a great idea. Just wondering: from a user experience perspective, will there be any way to know the rate limit has been hit?

StephenSorriaux avatar Nov 13 '22 17:11 StephenSorriaux

No, at least not in this current form.

Sync requests will block, as if the zk server was slow.

This was designed for async requests so that code, possibly in different threads, could simply queue them up knowing that the client would respect the set rate limit. Think of chains of get_children_async | get_async when walking a hierarchy.

We could add logging, it would not be difficult, but I would not want it to be intrusive (i.e. debug?)

I feel like throttling for rate limit is not an "issue", it is a feature.

Does thay make sense?

On Sun, Nov 13, 2022, 12:53 Stephen Sorriaux @.***> wrote:

I think it is a great idea. Just wondering: from a user experience perspective, will there be any way to know the rate limit has been hit?

— Reply to this email directly, view it on GitHub https://github.com/python-zk/kazoo/pull/683#issuecomment-1312785656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFTHQZ6D6LMQI7SJEFGBLWIETI5ANCNFSM6AAAAAAR6Q7REY . You are receiving this because you authored the thread.Message ID: @.***>

ceache avatar Nov 13 '22 18:11 ceache

Yes, that makes a lot of sense, thank you.

I agree with you, rate limit is a feature, but sometimes it can be hard to "understand"/be aware of it if there is no logs telling you "hey, rate limit is currently being triggered" or "hey, rate limit has been triggered N times in the past M seconds" (even if, I agree, there is already a info log at client startup). You're right this should not be intrusive, I know some projects display a "rate limit has been triggered" message once in a while (every 30s or so for instance), but I believe a debug message can work too if you think it is better/useful.

Writing this message, it makes me realize that this lib currently does not provide any metrics (if it would, a number of times rate limit has been triggered would be useful). Maybe it is something we could add to the backlog, but that is not the point of your current proposition that, again, I think is great.

StephenSorriaux avatar Nov 15 '22 23:11 StephenSorriaux

I'll add a log message for now, and move to add some tests.

About metrics, i was talking to opentelemetry folks at KubeCon about exactly that. it looks like it would be possible to use such a framework. I mean, they instrument SQLite3 and requests modules for example

IMHO, of we can pull this off, this would be an ideal way to extract this kind of information, as well as request counts, error counts, connection length and a million other things i have always wanted to know about my zookeeper client but never knew how to ask 😜

On a more serious note, if you like the idea, i can create an issue for this and try to gather more Intel?

On Tue, Nov 15, 2022, 18:41 Stephen Sorriaux @.***> wrote:

Yes, that makes a lot of sense, thank you.

I agree with you, rate limit is a feature, but sometimes it can be hard to "understand"/be aware of it if there is no logs telling you "hey, rate limit is currently being triggered" or "hey, rate limit has been triggered N times in the past M seconds" (even if, I agree, there is already a info log at client startup). You're right this should not be intrusive, I know some projects display a "rate limit has been triggered" message once in a while (every 30s or so for instance), but I believe a debug message can work too if you think it is better/useful.

Writing this message, it makes me realize that this lib currently does not provide any metrics (if it would, a number of times rate limit has been triggered would be useful). Maybe it is something we could add to the backlog, but that is not the point of your current proposition that, again, I think is great.

— Reply to this email directly, view it on GitHub https://github.com/python-zk/kazoo/pull/683#issuecomment-1316031405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFTHTGT7CYMUWYAPK2NWLWIQNS3ANCNFSM6AAAAAAR6Q7REY . You are receiving this because you authored the thread.Message ID: @.***>

ceache avatar Nov 16 '22 00:11 ceache

I'll add a log message for now, and move to add some tests. About metrics, i was talking to opentelemetry folks at KubeCon about exactly that. it looks like it would be possible to use such a framework. I mean, they instrument SQLite3 and requests modules for example IMHO, of we can pull this off, this would be an ideal way to extract this kind of information, as well as request counts, error counts, connection length and a million other things i have always wanted to know about my zookeeper client but never knew how to ask stuck_out_tongue_winking_eye On a more serious note, if you like the idea, i can create an issue for this and try to gather more Intel?

Sorry for my late reply @ceache, busy weeks :(

To be honest, I really like the idea and it would be great if you can get some intel about it, especially on how we should make those metrics available to our users: should we provide some interface so that they can plug whatever client they want? Should we actually provide some integration ourselves (like prometheus, etc.)? I totally feel you on this subject, I also love to get tons of metrics about what's in production!

StephenSorriaux avatar Nov 28 '22 22:11 StephenSorriaux

Hey Folks, any update on the progress of this PR ? Let me know if there is any help needed to complete this feature.

Buffer0x7cd avatar Aug 18 '23 22:08 Buffer0x7cd

@python-zk/maintainers I think I have implemented all what we discussed. All the more "future looking" ideas discussed here (prometheus, OTL,) would be addressed in future PRs

ceache avatar Feb 08 '24 22:02 ceache