sbd icon indicating copy to clipboard operation
sbd copied to clipboard

Fix: sbd-inquisitor: SBD_SYNC_RESOURCE_STARTUP should consider SBD_PACEMAKER

Open isaacpittman-hitachi opened this issue 3 years ago • 8 comments

Currently, if SBD_SYNC_RESOURCE_STARTUP=false and SBD_PACEMAKER=false, sbd prints a warning every time it runs: "Should think about enabling SBD_SYNC_RESOURCE_STARTUP." But this warning can never be addressed, since SBD_SYNC_RESOURCE_STARTUP=true doesn't work with SBD_PACEMAKER=false.

Additionally, if SBD_PACEMAKER=false but SBD_SYNC_RESOURCE_STARTUP=true, the SBD_SYNC_RESOURCE_STARTUP=true will have no effect, since SBD_PACEMAKER controls whether sbd sends the ping to pacemaker. (If pacemaker is also configured with SBD_SYNC_RESOURCE_STARTUP=true, it will hang forever waiting for the ping from sbd that never comes.)

Handle both of these cases by checking the values of SBD_SYNC_RESOURCE_STARTUP and SBD_PACEMAKER:

  • SBD_SYNC_RESOURCE_STARTUP=true, SBD_PACEMAKER=true: No change. Startup syncs as expected.

  • SBD_SYNC_RESOURCE_STARTUP=false, SBD_PACEMAKER=true: No change. Sbd warns about enabling SBD_SYNC_RESOURCE_STARTUP as expected.

  • SBD_SYNC_RESOURCE_STARTUP=true, SBD_PACEMAKER=false: Currently, pacemaker would hang. With the fix, sbd aborts with an error.

  • SBD_SYNC_RESOURCE_STARTUP=false, SBD_PACEMAKER=false: Currently, sbd warns about enabling SBD_SYNC_RESOURCE_STARTUP. With the fix, the behavior is the same as it was before SBD_SYNC_RESOURCE_STARTUP was introduced: no warning.

isaacpittman-hitachi avatar Jan 12 '22 15:01 isaacpittman-hitachi

Can one of the admins verify this patch?

knet-ci-bot avatar Jan 12 '22 15:01 knet-ci-bot

test this please

wenningerk avatar Jan 12 '22 15:01 wenningerk

Not sure if what you tried is a valid configuration. With SBD_PACEMAKER==false you probably should disable watchdog-fencing on the pacemaker side and then pacemaker shouldn't hang anymore. Disabling the warning in case SBD_PACEMAKER==false is probably a good idea. The other thing might be handled better on the pacemaker-side which should probably exit if SBD_PACEMAKER==false and watchdog-fencing is enabled. Current behavior is at least save in a way that it prevents split-brain by not starting resources.

wenningerk avatar Jan 12 '22 15:01 wenningerk

Thank you for the feedback! You're probably right that I tested an invalid configuration.

I've removed the check that broke the unit tests but kept the disabled warning in case SBD_PACEMAKER==false.

isaacpittman-hitachi avatar Jan 12 '22 16:01 isaacpittman-hitachi

test this please

wenningerk avatar Jan 12 '22 16:01 wenningerk

Hmm ... Guess I take back what I said yesterday. This requires a bit more thinking. I think we should make a difference if resource-startup-syncing got set coming in as default or explicitly. Some thoughts but this is probably not final ...

If both resource-startup-syncing = true and pacemaker-check = false are set explicitly this is definitely a false-configuration and I guess exiting sbd would be the correct thing to do. If resource-startup-syncing is going for the default we might enforce defaulting to false if pacemaker-checks are explicitly set to false. Critical is that pacemaker-checks might be disabled via cmdline and that is something pacemaker couldn't check for. If we detect critical inconsistency here we might exit as well. Downside of an automatism as described above is that we'd need a similar implementation in pacemaker.

In pacemaker we should probably at least catch the case that pacemaker-checks are disabled and watchdog-fencing is enabled. We can't easily catch that earlier in sbd-startup unfortunately as with pacemaker-checks disabled we have no easy way to check for enabled watchdog-fencing. And if we already need to know how pacemaker-checks are switched on/off we can introduce the automatism for resource-startup-syncing as well.

We on top should try not to break high-level-tool-compatibility. For pcs and sbd with just disk-fencing (no pacemaker awareness using 3 disks as otherwise the 1 disk shared disk would become a spof) this might be broken already - have to check.

Your initial suggestion is probably a good first step in the right direction at least. Maybe the error-output when it bails out should be more explicit as to mention that it is an inconsistent configuration. I'd like to think over a final goal still before we change anything at this point.

wenningerk avatar Jan 13 '22 14:01 wenningerk

Thank you for the in depth feedback! I was hoping to at least get the suppressed warning checked in, since the frequent warnings make it harder to spot other issues in the logs. If that's alright, I could file another PR for the suppressed warning issue and keep this one open to address compatibility in general between resource-startup-syncing and pacemaker-check.

isaacpittman-hitachi avatar Jan 13 '22 15:01 isaacpittman-hitachi

Sorry for not reacting for so long. Got involved in other things and wanted to think it through properly - this time ;-) - before actual changes to the code-base of sbd or pacemaker. Should be able to pick up on the issue sometimes early this week.

wenningerk avatar Jan 30 '22 17:01 wenningerk