pacemaker
pacemaker copied to clipboard
Fix: tools: crm_mon --daemonize should update when disconnected
With crm_mon --daemonize
, the output will continue to show the last known status after the cluster is stopped on the local node. It should go back to something like "Cluster is not available"
.
Likewise, messages like "Reconnecting..."
should not go to the daemonized output. The output file (or external handler) should receive only the cluster status. So we print those messages only if output_format
is mon_output_console
(or if we're in one-shot mode, where applicable). Normally those messages don't get printed to the daemonized output anyway because we don't flush the buffer, but theoretically they could get printed if the buffer fills up.
It's questionable whether we want to print "Cluster is not available"
when we're in interactive mode or only in daemonized mode. For now, we'll do daemonized-only. Interactive (console) mode already gets the "Connection to cluster daemons terminated"
message, which is quickly replaced by "Reconnecting..."
.
External agents are notified via traps rather than via output, so we can ignore them.
Closes T15
The first commit may not be comprehensive -- for example, "waiting for start" type messages may still appear in daemonized text output during startup. It's an improvement and should take care of most situations after we're already up and running.
pushed from the wrong local branch...
Updated. This gets the job done for daemonized mode without switching on output formats and such, and without negatively impacting console or one-shot mode.
It does not print a string representation of the pacemakerd
status (e.g., "Pacemaker daemons shutting down") if the pacemakerd
state is noteworthy when run in one-shot mode. That would require either merging pcmk_status.c:pacemakerd_status()
and crm_mon.c:pacemakerd_status()
into one function that can fulfill both use cases' requirements, or adding a fair amount of code to pcmk_status.c:pacemakerd_status()
and its callers.
I think it would be a good idea to merge them into library functions at some point -- and likewise for other similar static functions. I started working on it and named it pcmk__pacemakerd_status()
... and then it failed to compile because there's already a function with that name in pcmk_cluster_queries.c
. That function and its event callback do some of the same things but were built with a different use case in mind. Getting everything merged cleanly shouldn't be a huge undertaking, but it's not trivial and so I don't want to let it hold up this PR.
I also put a couple of formatter functions for a new message in crm_mon.c
and registered them directly. It would make intuitive sense to have crm_mon_register_messages()
register them. That function currently is defined in crm_mon_curses.c
and registers formatters specifically for curses. If we want, we can change that function name to "crm_mon_register_curses_messages" and then define a "crm_mon_register_messages" in crm_mon.c
.
Depending on how much effort it is, before having this merged, I may try to make the console and daemonized mode share more code with one-shot mode since we're doing that work in #2902. That would entail, at minimum, using pcmk__pacemakerd_status()
. It could also involve sharing a cib_connect()
and fencing_connect()
implementation and maybe incorporating pcmk__status()
... I haven't given it any detailed thought since starting on #2902.
Updated minimally to address review. Most of the massive Compare output is from rebasing on current main and resolving conflicts. There's still room for improvement in both the output and the implementation, but this gets the job done.
I'll be filing another PR with some other refactors as a start
The CI failures were RPM issues on three hosts. Two of them were OpenSUSE hosts that timed out when trying to access the libqb100 RPM. The other was CentOS 9 s390x: " - nothing provides libqb(s390-64) = 2.0.6-1.el9.next needed by libqb-devel-2.0.6-1.el9.next.s390x"
Totally replaced commits.
Still working out some schema issues (edit: fixed in #2919)
Updated to address review, which was more involved than expected (see https://github.com/ClusterLabs/pacemaker/pull/2868#discussion_r1014536512). This is already way too big, so if it gets any bigger, it'll be split up.
- Updated
cib__signon_query()
to returnpcmk_rc_no_input
if(rc == pcmk_rc_ok) && (*cib_object == NULL)
. - Added a commit at the beginning to fix the pcmkd state display for XML format (should use the short display instead of friendly).
- Added two commits to include
pacemakerd
status in thecluster-summary
message for native CIB connections. It's excluded by default for file and remote CIB connections (if you include it, it will always say "Unknown"). It's also specifically excluded whenpcmk_simulate
callscluster-summary
.
Not addressed yet:
- Newline before "Connection to the cluster lost". Need to figure out whether to leave that as-it-was, add the newline, or modify the curses error/info functions to start at the beginning of the line.
Updated to:
- remove the newline from the "Connection to cluster lost" transient message
- move the
pacemakerd
state into the stack section:
<stack type="corosync" pacemakerd-state="running"/>
<stack type="corosync" pacemakerd-state="shutting_down"/>
* Stack: corosync (Pacemaker is running)
* Stack: corosync (Pacemaker daemons are shutting down)
- Remove the pacemakerd-health messages from
crm_mon.c
(thecrm-mon-disconnected
message covers this already for the non-one-shot case)