pdns
pdns copied to clipboard
dnsdist: manually toggling a downstream server to down/up confuses lazy health checking
-
[x] This is not a support question, I have read about opensource and will send support questions to the IRC channel, Github Discussions or the mailing list.
-
[x] I have read and understood the 'out in the open' support policy
-
Program: dnsdist
-
Issue type: Bug report
Short description
After temporarily setting a downstream server (configured with lazy health check) to down and then up/auto(true) again seems to confuse the lazy health check thread, which continues to send health check pings to said server for no reason. Yes, the server is available and responds. :)
Environment
- Operating system: Linux
- Software version: dnsdist 1.9.0
- Software source: Gentoo (compiled from source)
Steps to reproduce
- add a working server with lazy health checking
- manually set the server to "down"
- manually set the server "up" or "auto(true)" again
Expected behaviour
Only a single health check (more precisely the required number to 'rise' as configured)
Actual behaviour
The server gets not just a few but a never-ending series of health checks
The "fix" is to remove/re-add the server that was taken down temporarily.
I'll look into this in the next couple days, this is likely related to https://github.com/PowerDNS/pdns/discussions/13811
manually set the server "up" or "auto(true)" again
Oh, are you calling setUp() or setAuto(true)? These switch the backend to the regular health-check mode, you need to do setLazyAuto(true) if you want to keep the lazy health-check behaviour.
Oh, are you calling
setUp()orsetAuto(true)?
I tried both, obviously. :clown_face:
Using setLazyAuto works without creating a stream of pings, thanks!
But I got to say, this is..very unexpected?
Yeah, at the very least I need to document it more clearly!
Can we not get rid of these special modes? I don't understand what disabling/enabling a server has to do with implicitly changing the configuration. setUp/Down should be independent of any other configuration and certainly not silently change it; how the down/up state transition of the specific server is handled should depend only on its configuration, not the command. That way you could get rid of the setAuto variants. That being said, I might well be missing/misunderstanding something and I don't want to complain too much.
Since this is not a bug after all feel free to close it. Thanks!
Well it's a bit more complicated than that, I think. The point of setAuto() and setLazyAuto() is actually to be able to change the configuration of a backend, so they do need to exist. but I do agree that setUp() and setDown() might be confusing: they do set the state of the backend (UP or DOWN) AND disable health-checking, because if you want to manually set the state of a backend you do not want it to change under your feet. Now I see how you might expect that the backend remembers the kind of health-check that was in use before calling setUp() or setDown(), so that calling setAuto() later restores the previous mode, but that's not how it is currently implemented. If we want to implement this behaviour we would need two things: storing the health-check mode (lazy or not) independently AND a new directive that restore the previous state (resumeHealthChecks() perhaps?) since some setups might already be relying on the current behaviour of setAuto() to change the health-check mode (since 1.8.0). I agree it would have been better to keep setAuto() for that purpose and add a new call to set the mode to non-lazy, but unfortunately doing so now would be a breaking change.