pacemaker
pacemaker copied to clipboard
WIP: Make remote reads asynchronous
I've been working through the comments in T855 and everything's been pretty easy up to the lrmd_tls_dispatch patch. When running the RemoteMigrate test in cts-lab, I'm seeing this:
Sep 03 12:01:41.043 rhel9-ctslab-1 pacemaker-controld [60448] (lrmd_tls_dispatch_async@lrmd_client.c:490) trace: TLS dispatch triggered
Sep 03 12:01:41.043 rhel9-ctslab-1 pacemaker-controld [60448] (lrmd_tls_dispatch_async@lrmd_client.c:498) info: BEFORE REMOTE READY
Sep 03 12:01:41.043 rhel9-ctslab-1 pacemaker-controld [60448] (lrmd_tls_dispatch_async@lrmd_client.c:500) info: AFTER REMOTE READY
Sep 03 12:01:41.043 rhel9-ctslab-1 pacemaker-controld [60448] (lrmd_tls_dispatch_async@lrmd_client.c:503) info: READY SAID OK
Sep 03 12:01:41.043 rhel9-ctslab-1 pacemaker-controld [60448] (lrmd_tls_dispatch_async@lrmd_client.c:519) info: BEFORE READ AVAIL
Sep 03 12:01:41.043 rhel9-ctslab-1 pacemaker-controld [60448] (lrmd_tls_dispatch_async@lrmd_client.c:521) info: AFTER READ AVAIL
Sep 03 12:01:41.043 rhel9-ctslab-1 pacemaker-controld [60448] (lrmd_tls_dispatch_async@lrmd_client.c:529) info: READ SAID AGAIN
Sep 03 12:01:41.045 rhel9-ctslab-1 pacemaker-schedulerd[60447] ([email protected]:1747) info: rhel9-ctslab-1 is active
Sep 03 12:01:41.045 rhel9-ctslab-1 pacemaker-schedulerd[60447] ([email protected]:1911) info: rhel9-ctslab-1 is online
Sep 03 12:01:41.045 rhel9-ctslab-1 pacemaker-schedulerd[60447] ([email protected]:1747) info: rhel9-ctslab-2 is active
Sep 03 12:01:41.045 rhel9-ctslab-1 pacemaker-schedulerd[60447] ([email protected]:1911) info: rhel9-ctslab-2 is online
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning remote-rhel9-ctslab-3
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning Fencing
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning FencingFail
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning rsc_rhel9-ctslab-1
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning rsc_rhel9-ctslab-2
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning rsc_rhel9-ctslab-3
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning migrator
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning ping-1:0
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning ping-1:1
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning ping-1:2
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning ping-1:3
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__primitive_assign@pcmk_sched_primitive.c:542) info: Resource ping-1:3 cannot run anywhere
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning stateful-1:0
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning stateful-1:1
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning stateful-1:2
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__primitive_assign@pcmk_sched_primitive.c:542) info: Resource stateful-1:2 cannot run anywhere
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (set_instance_role@pcmk_sched_promotable.c:1065) info: Choosing stateful-1:0 (Promoted) on rhel9-ctslab-1 for promotion
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__set_instance_roles@pcmk_sched_promotable.c:1097) info: promotable-1: Promoted 1 instances of a possible 1
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning r192.168.122.204
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning petulant
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning r192.168.122.205
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__unassign_resource@pcmk_sched_resource.c:548) info: Unassigning lsb-dummy
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (recurring_op_for_active@pcmk_sched_recurring.c:339) info: Start 1m-interval monitor for ping-1:2 on remote-rhel9-ctslab-3
Sep 03 12:01:41.048 rhel9-ctslab-1 pacemaker-schedulerd[60447] (recurring_op_for_active@pcmk_sched_recurring.c:339) info: Start 20s-interval monitor for remote-rhel9-ctslab-3 on rhel9-cts
lab-1
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave Fencing (Started rhel9-ctslab-1)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave FencingFail (Started rhel9-ctslab-2)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave rsc_rhel9-ctslab-1 (Started rhel9-ctslab-1)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave rsc_rhel9-ctslab-2 (Started rhel9-ctslab-2)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave rsc_rhel9-ctslab-3 (Started rhel9-ctslab-1)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave migrator (Started rhel9-ctslab-2)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave ping-1:0 (Started rhel9-ctslab-1)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave ping-1:1 (Started rhel9-ctslab-2)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (log_list_item@output_log.c:232) notice: Actions: Start ping-1:2 ( remote-rhel9-ct
slab-3 )
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1123) info: Leave ping-1:3 (Stopped)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave stateful-1:0 (Promoted rhel9-ctslab-1)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave stateful-1:1 (Unpromoted rhel9-ctslab-2)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1123) info: Leave stateful-1:2 (Stopped)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave r192.168.122.204 (Started rhel9-ctslab-1)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave r192.168.122.205 (Started rhel9-ctslab-1)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave lsb-dummy (Started rhel9-ctslab-2)
Sep 03 12:01:41.049 rhel9-ctslab-1 pacemaker-schedulerd[60447] (rsc_action_default@pcmk_output.c:1206) info: Leave remote-rhel9-ctslab-3 (Started rhel9-ctslab-1)
Sep 03 12:01:41.050 rhel9-ctslab-1 pacemaker-schedulerd[60447] (pcmk__log_transition_summary@pcmk_graph_producer.c:965) warning: Calculated transition 10 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-1264.bz2
Sep 03 12:01:41.062 rhel9-ctslab-1 pacemaker-controld [60448] (do_state_transition@controld_fsa.c:612) info: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_reply
Sep 03 12:01:41.063 rhel9-ctslab-1 pacemaker-controld [60448] (do_te_invoke@controld_transition.c:171) info: Processing graph 10 (ref=pe_calc-60448_pacemaker-controld-1725379301-117) derived from /var/lib/pacemaker/pengine/pe-warn-1264.bz2
Sep 03 12:01:41.063 rhel9-ctslab-1 pacemaker-controld [60448] (execute_rsc_action@controld_te_actions.c:404) notice: Initiating monitor operation rsc_rhel9-ctslab-1_monitor_0 locally on remote-rhel9-ctslab-3 | action 14
Sep 03 12:01:41.063 rhel9-ctslab-1 pacemaker-controld [60448] (lrmd_send_command@lrmd_client.c:980) trace: Sending lrmd_rsc_info op to executor
Sep 03 12:01:41.064 rhel9-ctslab-1 pacemaker-controld [60448] (lrmd_create_op@lrmd_client.c:640) trace: Created executor lrmd_rsc_info command with call options 00000000 (0)
Sep 03 12:01:41.064 rhel9-ctslab-1 pacemaker-controld [60448] ([email protected]:270) error: pcmk__remote_message_xml: Triggered fatal assertion at remote.c:564 : remote->buffer[sizeof(struct remote_header_v0) + header->payload_uncompressed - 1] == 0
Sep 03 12:01:41.064 rhel9-ctslab-1 pacemaker-controld [60448] (pcmk__log_xmllib_err@xml_display.c:39) error: XML Error: Entity: line 1: parser error : Document is empty
Sep 03 12:01:41.064 rhel9-ctslab-1 pacemaker-controld [60448] (pcmk__log_xmllib_err@xml_display.c:39) error: XML Error:
Sep 03 12:01:41.064 rhel9-ctslab-1 pacemaker-controld [60448] (pcmk__log_xmllib_err@xml_display.c:39) error: XML Error: ^
Sep 03 12:01:41.064 rhel9-ctslab-1 pacemaker-controld [60448] (pcmk__log_xmllib_err@xml_display.c:39) error: XML Error: Entity: line 1: parser error : Document is empty
Sep 03 12:01:41.064 rhel9-ctslab-1 pacemaker-controld [60448] (pcmk__log_xmllib_err@xml_display.c:39) error: XML Error:
Sep 03 12:01:41.064 rhel9-ctslab-1 pacemaker-controld [60448] (pcmk__log_xmllib_err@xml_display.c:39) error: XML Error: ^
Sep 03 12:01:41.064 rhel9-ctslab-1 pacemaker-controld [60448] ([email protected]:572) error: Couldn't parse: ''
Sep 03 12:01:41.064 rhel9-ctslab-1 pacemaker-controld [60448] ([email protected]:270) error: localized_remote_header: Triggered fatal assertion at remote.c:105 : endian == ENDIAN_LOCAL
And then the errors go on from there. It looks like we notice there's going to be more to the message, but then never get back around to reading the next chunk and then somewhere, an empty message is being read. I would have expected to see "BEFORE REMOTE MSG" in the log if it were coming from lrmd_tls_dispatch_async. I could use a pointer on how I might further debug this.