community.sap_install icon indicating copy to clipboard operation
community.sap_install copied to clipboard

sap_ha_pacemaker_cluster: During graceful shutdown/reboot the node gets fenced is SAPHana or SAPDatabase resources are defined

Open rob0d opened this issue 7 months ago • 3 comments

Ansible Role

sap_ha_pacemaker_cluster

OS Family

N/A

Ansible Controller - Python version

Irrelevant

Ansible-core version

Irrelevant

Bug Description

Hi @berndfinger / @ja9fuchs / @marcelmamula,

I'm not sure if this should be bug or an enhancement, However as system experiences fencing event I did this as a bug report.

I have experienced it only on HANA cluster, but according to Redhat this affects all clusters with SAPDatabase, SAPInstance and SAPHANA resource agents.

Symptom: When the graceful shutdown/reboot is initiated of the setup managing SAPDatabase and has systemd based SAP startup framework enabled, the SAPDatabase resource fails at the stop operation and the node gets fenced. The same happens with SAPHana and apparently SAPInstance resources.

Details: Details are described here:

  • https://access.redhat.com/solutions/7029705
  • https://access.redhat.com/solutions/7066262

Solution:

Solution is fairly simple (copied from the two RH KBs)

For SAPHana resources:

On each cluster node, create the directory /etc/systemd/system/pacemaker.service.d/ and place a drop-in file for Pacemaker with the following content:

# cat /etc/systemd/system/pacemaker.service.d/00-pacemaker.conf
[Unit]
Description=Pacemaker needs the SAP HANA instance service
Wants=SAPRH2_02.service
After=SAPRH2_02.service

For SAPDatabase/SAPInstance resources:

# cat /etc/systemd/system/pacemaker.service.d/00-pacemaker.conf
[Unit]
Description=Pacemaker needs the SAP Host Agent service
Wants=SAPRH2_02.service saphostagent.service
After=SAPRH2_02.service saphostagent.service

I'd like to suggest that we add this to the post steps in sap_ha_pacemaker_cluster. However, I'm not sure what to do on Suse with crmsh and if there is an equivalent config that will have to be applied there.

Bug reproduction

Reboot a node.

Community participation

Happy to help with this bug fix, but may need help (e.g. first time contributing to open-source using git)

rob0d avatar Apr 29 '25 13:04 rob0d

@rob0d I have removed Bug tag as this is not bug.

Mentioned steps are optional setup only when you are troubleshooting issues and it is not part of recommended cluster. SUSE also documents it, but it is not part of our documents and blogs.

We mention it in man page SAPHanaSR_basic_cluster

       * show pacemaker service drop-in file

       In case systemd-style init is used for the HANA database, it might be desired to have the  SAP instance service stopping  after
       pacemaker at system shutdown.  A drop-in file might help. Example SID is S07, instance number is 00.

         # cat /etc/systemd/system/pacemaker.service.d/00-pacemaker.conf
         [Unit]
         Description=pacemaker needs SAP instance service
         Documentation=man:SAPHanaSR_basic_cluster(7)
         Wants=SAPS07_00.service
         After=SAPS07_00.service

I would be OK with adding this as optional post task, which is controlled by new variable with d(false).

marcelmamula avatar Apr 29 '25 14:04 marcelmamula

Update for SUSE:

In essence:

  • SAP HANA HA is fine to setup and can be implemented as optional post task since whole role is dealing with systemd.
  • SAP ASCS/ERS - It can be done for shutdown, but it is NO GO for startup.

marcelmamula avatar Jul 29 '25 07:07 marcelmamula