pdns icon indicating copy to clipboard operation
pdns copied to clipboard

dnsdist: manually toggling a downstream server to down/up confuses lazy health checking

Open hhoffstaette opened this issue 1 year ago • 7 comments

Short description

After temporarily setting a downstream server (configured with lazy health check) to down and then up/auto(true) again seems to confuse the lazy health check thread, which continues to send health check pings to said server for no reason. Yes, the server is available and responds. :)

Environment

  • Operating system: Linux
  • Software version: dnsdist 1.9.0
  • Software source: Gentoo (compiled from source)

Steps to reproduce

  1. add a working server with lazy health checking
  2. manually set the server to "down"
  3. manually set the server "up" or "auto(true)" again

Expected behaviour

Only a single health check (more precisely the required number to 'rise' as configured)

Actual behaviour

The server gets not just a few but a never-ending series of health checks

hhoffstaette avatar Feb 27 '24 12:02 hhoffstaette

The "fix" is to remove/re-add the server that was taken down temporarily.

hhoffstaette avatar Feb 27 '24 12:02 hhoffstaette

I'll look into this in the next couple days, this is likely related to https://github.com/PowerDNS/pdns/discussions/13811

rgacogne avatar Feb 27 '24 13:02 rgacogne

manually set the server "up" or "auto(true)" again

Oh, are you calling setUp() or setAuto(true)? These switch the backend to the regular health-check mode, you need to do setLazyAuto(true) if you want to keep the lazy health-check behaviour.

rgacogne avatar Feb 27 '24 15:02 rgacogne

Oh, are you calling setUp() or setAuto(true)?

I tried both, obviously. :clown_face: Using setLazyAuto works without creating a stream of pings, thanks! But I got to say, this is..very unexpected?

hhoffstaette avatar Feb 27 '24 15:02 hhoffstaette

Yeah, at the very least I need to document it more clearly!

rgacogne avatar Feb 27 '24 15:02 rgacogne

Can we not get rid of these special modes? I don't understand what disabling/enabling a server has to do with implicitly changing the configuration. setUp/Down should be independent of any other configuration and certainly not silently change it; how the down/up state transition of the specific server is handled should depend only on its configuration, not the command. That way you could get rid of the setAuto variants. That being said, I might well be missing/misunderstanding something and I don't want to complain too much.

Since this is not a bug after all feel free to close it. Thanks!

hhoffstaette avatar Feb 27 '24 16:02 hhoffstaette

Well it's a bit more complicated than that, I think. The point of setAuto() and setLazyAuto() is actually to be able to change the configuration of a backend, so they do need to exist. but I do agree that setUp() and setDown() might be confusing: they do set the state of the backend (UP or DOWN) AND disable health-checking, because if you want to manually set the state of a backend you do not want it to change under your feet. Now I see how you might expect that the backend remembers the kind of health-check that was in use before calling setUp() or setDown(), so that calling setAuto() later restores the previous mode, but that's not how it is currently implemented. If we want to implement this behaviour we would need two things: storing the health-check mode (lazy or not) independently AND a new directive that restore the previous state (resumeHealthChecks() perhaps?) since some setups might already be relying on the current behaviour of setAuto() to change the health-check mode (since 1.8.0). I agree it would have been better to keep setAuto() for that purpose and add a new call to set the mode to non-lazy, but unfortunately doing so now would be a breaking change.

rgacogne avatar Feb 27 '24 16:02 rgacogne