kazoo
kazoo copied to clipboard
Emit warning if ping socket select takes longer
Why is this needed?
We experienced Zookeeper session timeouts within our applications with Kazoo because of GIL contention blocking the Kazoo handler thread and its ping loop.
Proposed Changes
- Emitting a warning log message to indicate a longer sleep time of the handler/thread.
Does this PR introduce any breaking change?
No
Thank you for this PR.
I understand the need. I am wondering if this warning should (can?) be integrated in the select()
method instead so that those kind of contentions can also be detected when trying to _read()
or _write()
. Any thoughts?
For the PR itself, timing this and logging a warning is probably a good idea, although @StephenSorriaux has a good question.
Although I'm curious, how do you know it's the GIL blocking you? Is this under Py2 or Py3? (the GIL scheduler changed slightly to reduce the odds of this kind of thing in Py3).
I just wonder if you're instead hitting something like the problem described in https://github.com/python-zk/kazoo/pull/633...
Again though, probably still a good idea to calculate/log the warning, regardless of the root cause.
I am wondering as well if this is not also addressed by #633
Could you maybe test with that patch and see if you see improvements in your setup?
Cheers,
On Sun, Dec 13, 2020, 15:25 Jeff Widman [email protected] wrote:
For the PR itself, timing this and logging a warning is probably a good idea, although @StephenSorriaux https://github.com/StephenSorriaux has a good question.
Although I'm curious, how do you know it's the GIL blocking you? Is this under Py2 or Py3? (the GIL scheduler changed slightly to reduce the odds of this kind of thing in Py3).
I just wonder if you're instead hitting something like the problem described in #633 https://github.com/python-zk/kazoo/pull/633...
Again though, probably still a good idea to calculate/log the warning, regardless of the root cause.
— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/python-zk/kazoo/pull/632#issuecomment-744063413, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFTHX423KM4T5JFNAD2L3SUUPMDANCNFSM4USRNYMA .
Nudge @jarus
Sorry for my delayed reply.
Although I'm curious, how do you know it's the GIL blocking you? Is this under Py2 or Py3? (the GIL scheduler changed slightly to reduce the odds of this kind of thing in Py3).
I know that most people blame the GIL for their application failures. Once I assumed something similar, I actually started to investigate a bit in that area and developed some tooling for measuring the GIL contention. If you are interested, you could take a look at my recent EuroPython talk about this topic: https://www.youtube.com/watch?v=HtbLNgXmLrw
I just wonder if you're instead hitting something like the problem described in #633... I am wondering as well if this is not also addressed by #633
Yes, the new implementation looks promising. Until now, I sadly hadn't the chance to test it in our application.
How should we proceed? If you think the warning is still helpful, I would rebase and adjust the change.