middleware
middleware copied to clipboard
NAS-118285 / 22.12 / add more reasons to failover.disabled.reasons
2 issues have been found internally testing SCALE HA.
- the webUI dashboard is showing a "healthy" HA icon because failover.disabled.reasons is returning an empty array, however, it's ignoring the fact the other controller could be currently running a failover event. This means if the user tries to initiate a failover while the other node is currently processing one, it will be ignored by design.
- for reasons yet unknown, we're seeing HA systems not running the journal synchronization thread which means db transactions between the nodes are not being synchronized. This is, obviously, a pretty significant failure condition (yet we're not seeing it en masse, only on HA VMs used for integration tests) so I've added that check for both controllers.
Jira URL: https://ixsystems.atlassian.net/browse/NAS-118285
Waiting on associated webUI PR to be merged before merging this: https://github.com/truenas/webui/pull/7124