middleware icon indicating copy to clipboard operation
middleware copied to clipboard

NAS-118285 / 22.12 / add more reasons to failover.disabled.reasons

Open yocalebo opened this issue 3 years ago • 2 comments

2 issues have been found internally testing SCALE HA.

  1. the webUI dashboard is showing a "healthy" HA icon because failover.disabled.reasons is returning an empty array, however, it's ignoring the fact the other controller could be currently running a failover event. This means if the user tries to initiate a failover while the other node is currently processing one, it will be ignored by design.
  2. for reasons yet unknown, we're seeing HA systems not running the journal synchronization thread which means db transactions between the nodes are not being synchronized. This is, obviously, a pretty significant failure condition (yet we're not seeing it en masse, only on HA VMs used for integration tests) so I've added that check for both controllers.

yocalebo avatar Sep 23 '22 14:09 yocalebo

Jira URL: https://ixsystems.atlassian.net/browse/NAS-118285

bugclerk avatar Sep 23 '22 14:09 bugclerk

Waiting on associated webUI PR to be merged before merging this: https://github.com/truenas/webui/pull/7124

yocalebo avatar Sep 23 '22 17:09 yocalebo