pdns dnsdist: manually toggling a downstream server to down/up confuses lazy health checking

[x] This is not a support question, I have read about opensource and will send support questions to the IRC channel, Github Discussions or the mailing list.
[x] I have read and understood the 'out in the open' support policy
Program: dnsdist
Issue type: Bug report

Short description

After temporarily setting a downstream server (configured with lazy health check) to down and then up/auto(true) again seems to confuse the lazy health check thread, which continues to send health check pings to said server for no reason. Yes, the server is available and responds. :)

Environment

Operating system: Linux
Software version: dnsdist 1.9.0
Software source: Gentoo (compiled from source)

Steps to reproduce

add a working server with lazy health checking
manually set the server to "down"
manually set the server "up" or "auto(true)" again

Expected behaviour

Only a single health check (more precisely the required number to 'rise' as configured)

Actual behaviour

The server gets not just a few but a never-ending series of health checks

Feb 27 '24 12:02 hhoffstaette

The "fix" is to remove/re-add the server that was taken down temporarily.

Feb 27 '24 12:02 hhoffstaette

I'll look into this in the next couple days, this is likely related to https://github.com/PowerDNS/pdns/discussions/13811

Feb 27 '24 13:02 rgacogne

manually set the server "up" or "auto(true)" again

Oh, are you calling setUp() or setAuto(true)? These switch the backend to the regular health-check mode, you need to do setLazyAuto(true) if you want to keep the lazy health-check behaviour.

Feb 27 '24 15:02 rgacogne

Oh, are you calling setUp() or setAuto(true)?

I tried both, obviously. :clown_face: Using setLazyAuto works without creating a stream of pings, thanks! But I got to say, this is..very unexpected?

Feb 27 '24 15:02 hhoffstaette

Yeah, at the very least I need to document it more clearly!

Feb 27 '24 15:02 rgacogne

Can we not get rid of these special modes? I don't understand what disabling/enabling a server has to do with implicitly changing the configuration. setUp/Down should be independent of any other configuration and certainly not silently change it; how the down/up state transition of the specific server is handled should depend only on its configuration, not the command. That way you could get rid of the setAuto variants. That being said, I might well be missing/misunderstanding something and I don't want to complain too much.

Since this is not a bug after all feel free to close it. Thanks!

Feb 27 '24 16:02 hhoffstaette

Well it's a bit more complicated than that, I think. The point of setAuto() and setLazyAuto() is actually to be able to change the configuration of a backend, so they do need to exist. but I do agree that setUp() and setDown() might be confusing: they do set the state of the backend (UP or DOWN) AND disable health-checking, because if you want to manually set the state of a backend you do not want it to change under your feet. Now I see how you might expect that the backend remembers the kind of health-check that was in use before calling setUp() or setDown(), so that calling setAuto() later restores the previous mode, but that's not how it is currently implemented. If we want to implement this behaviour we would need two things: storing the health-check mode (lazy or not) independently AND a new directive that restore the previous state (resumeHealthChecks() perhaps?) since some setups might already be relying on the current behaviour of setAuto() to change the health-check mode (since 1.8.0). I agree it would have been better to keep setAuto() for that purpose and add a new call to set the mode to non-lazy, but unfortunately doing so now would be a breaking change.

Feb 27 '24 16:02 rgacogne

pdns pdns copied to clipboard

dnsdist: manually toggling a downstream server to down/up confuses lazy health checking

Short description

Environment

Steps to reproduce

Expected behaviour

Actual behaviour

pdns
pdns copied to clipboard