kazoo icon indicating copy to clipboard operation
kazoo copied to clipboard

Emit warning if ping socket select takes longer

Open jarus opened this issue 4 years ago • 5 comments

Why is this needed?

We experienced Zookeeper session timeouts within our applications with Kazoo because of GIL contention blocking the Kazoo handler thread and its ping loop.

Proposed Changes

  • Emitting a warning log message to indicate a longer sleep time of the handler/thread.

Does this PR introduce any breaking change?

No

jarus avatar Dec 08 '20 21:12 jarus

Thank you for this PR.

I understand the need. I am wondering if this warning should (can?) be integrated in the select() method instead so that those kind of contentions can also be detected when trying to _read() or _write(). Any thoughts?

StephenSorriaux avatar Dec 13 '20 17:12 StephenSorriaux

For the PR itself, timing this and logging a warning is probably a good idea, although @StephenSorriaux has a good question.

Although I'm curious, how do you know it's the GIL blocking you? Is this under Py2 or Py3? (the GIL scheduler changed slightly to reduce the odds of this kind of thing in Py3).

I just wonder if you're instead hitting something like the problem described in https://github.com/python-zk/kazoo/pull/633...

Again though, probably still a good idea to calculate/log the warning, regardless of the root cause.

jeffwidman avatar Dec 13 '20 20:12 jeffwidman

I am wondering as well if this is not also addressed by #633

Could you maybe test with that patch and see if you see improvements in your setup?

Cheers,

On Sun, Dec 13, 2020, 15:25 Jeff Widman [email protected] wrote:

For the PR itself, timing this and logging a warning is probably a good idea, although @StephenSorriaux https://github.com/StephenSorriaux has a good question.

Although I'm curious, how do you know it's the GIL blocking you? Is this under Py2 or Py3? (the GIL scheduler changed slightly to reduce the odds of this kind of thing in Py3).

I just wonder if you're instead hitting something like the problem described in #633 https://github.com/python-zk/kazoo/pull/633...

Again though, probably still a good idea to calculate/log the warning, regardless of the root cause.

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/python-zk/kazoo/pull/632#issuecomment-744063413, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFTHX423KM4T5JFNAD2L3SUUPMDANCNFSM4USRNYMA .

ceache avatar Dec 13 '20 20:12 ceache

Nudge @jarus

jeffwidman avatar Jan 21 '21 05:01 jeffwidman

Sorry for my delayed reply.

Although I'm curious, how do you know it's the GIL blocking you? Is this under Py2 or Py3? (the GIL scheduler changed slightly to reduce the odds of this kind of thing in Py3).

I know that most people blame the GIL for their application failures. Once I assumed something similar, I actually started to investigate a bit in that area and developed some tooling for measuring the GIL contention. If you are interested, you could take a look at my recent EuroPython talk about this topic: https://www.youtube.com/watch?v=HtbLNgXmLrw

I just wonder if you're instead hitting something like the problem described in #633... I am wondering as well if this is not also addressed by #633

Yes, the new implementation looks promising. Until now, I sadly hadn't the chance to test it in our application.

How should we proceed? If you think the warning is still helpful, I would rebase and adjust the change.

jarus avatar Jan 27 '21 21:01 jarus