community.sap_install icon indicating copy to clipboard operation
community.sap_install copied to clipboard

sap_hana_preconfigure: corruption of boot

Open sean-freeman opened this issue 6 months ago • 4 comments

Ansible Role

sap_hana_preconfigure

OS Family

RHEL

Ansible Controller - Python version

Python 3.13.3

Ansible-core version

ansible [core 2.16.13]

Bug Description

sap_hana_preconfigure internal logic for minimum kernel patch, miscalculates when OS Images are direct from ISO (e.g. CCSP certified) or hardened.

Ansible Task Create a list of minimum required package versions to be installed will confirm against the Ansible Role's internal variables, for a list of minimum patch levels (e.g. kernel).

If the OS Image contains a later kernel patch, but this does not show as an installed package - the Ansible Role will:

  • run an Ansible Task to force install the minimum kernel patch, effectively a kernel downgrade
  • run the next Ansible Task which will update all OS Packages, including the kernel, and increase beyond the original patch level
  • reboot and GRUB will incorrectly try to boot the wrong kernel

This is because the Ansible Task relies on command rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel within the internal logic. See examples of various OS Images where this fails:

[root@rhel86 ~]# uname -r
4.18.0-372.141.1.el8_6.x86_64

[root@rhel86 ~]# rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel
package kernel is not installed
[root@rhel88 ~]# uname -r
4.18.0-477.94.1.el8_8.x86_64

[root@rhel88 ~]# rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel
package kernel is not installed
[root@rhel90 ~]# uname -r
5.14.0-70.126.1.el9_0.x86_64

[root@rhel90 ~]# rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel
package kernel is not installed
[root@rhel-9-2 ~]# uname -r
5.14.0-284.110.1.el9_2.x86_64

[root@rhel-9-2 ~]# rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel
package kernel is not installed
[root@rhel94 ~]# uname -r
5.14.0-427.61.1.el9_4.x86_64

[root@rhel94 ~]# rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel
package kernel is not installed

Looking closer to what the Ansible executes on the system, using RHEL 9.2 as an example:

[root@rhel-9-2 ~]# uname -r
5.14.0-284.110.1.el9_2.x86_64

[root@rhel-9-2 ~]# rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel
package kernel is not installed

[root@rhel-9-2 ~]# (echo "1 kernel-5.14.0-284.25.1.el9_2";rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel |
             awk '{printf ("2 %s\n", $0)}') |
             awk '{gsub ("\\.el", ".0.0"); print}' |
             sort -k 2 -k 1 -V
1 kernel-5.14.0-284.25.1.0.09_2
2 package kernel is not installed

[root@rhel-9-2 ~]# (echo "1 kernel-5.14.0-284.25.1.el9_2";rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel |
             awk '{printf ("2 %s\n", $0)}') |
             awk '{gsub ("\\.el", ".0.0"); print}' |
             sort -k 2 -k 1 -V |
             awk '{gsub ("\\.0\\.0", ".el"); col1=$1; col2=$2; _nf=NF}
               $1==2{latestpkg=$2}
               END {
                      if (_nf>2) {
                         printf ("kernel-5.14.0-284.25.1.el9_2\n")
                      } else {
                         if (col1==1) {
                            printf ("kernel-5.14.0-284.25.1.el9_2\n")
                         }
                      }
                   }'
kernel-5.14.0-284.25.1.el9_2

In summary....

Boot 1 of OS Image for RHEL for SAP Solutions 9.2

  • 5.14.0-284.110.1.el9_2

Run Ansible Task 1 "Create a list of minimum required package versions to be installed" using variable

  • 5.14.0-284.25.1.el9_2

Run Ansible Task 2 "Ensure that the system is updated to the latest patchlevel"

  • 5.14.0-284.117.1.el9_2

Below, is the abbreviated stdout from Ansible and the matching GRUB entries:

Stage 1

TASK [community.sap_install.sap_hana_preconfigure : Create a list of minimum required package versions to be installed] ***
ok: [rhel-9-2] => (item=['kernel', '5.14.0-284.25.1.el9_2']) =>
    pkg:
    - kernel
    - 5.14.0-284.25.1.el9_2
    rc: 0

TASK [community.sap_install.sap_hana_preconfigure : Display the content of the minimum package list variable] *********
ok: [rhel-9-2] =>
    __sap_hana_preconfigure_register_minpkglist:
        results:
        -   ansible_loop_var: pkg
            stdout: kernel-5.14.0-284.25.1.el9_2
        skipped: false

TASK [community.sap_install.sap_hana_preconfigure : Install minimum packages if required] *****************************
    results:
    - 'Installed: kernel-core-5.14.0-284.25.1.el9_2.x86_64'
    - 'Installed: kernel-modules-5.14.0-284.25.1.el9_2.x86_64'
    - 'Installed: kernel-5.14.0-284.25.1.el9_2.x86_64'
    - 'Installed: kernel-modules-core-5.14.0-284.25.1.el9_2.x86_64'
[root@rhel-9-2 ~]# grubby --info=ALL

index=0
kernel="/boot/vmlinuz-5.14.0-284.110.1.el9_2.x86_64"
args="ro console=tty0 console=ttyS0,115200n8 no_timer_check net.ifnames=0"
root="UUID=d370e124-ea83-46ea-a7ef-67f12dd8bb3c"
initrd="/boot/initramfs-5.14.0-284.110.1.el9_2.x86_64.img"
title="Red Hat Enterprise Linux (5.14.0-284.110.1.el9_2.x86_64) 9.2 (Plow)"
id="9a8aa6d3d32c63426d70ef1043ac48ec-5.14.0-284.110.1.el9_2.x86_64"

index=1
kernel="/boot/vmlinuz-5.14.0-284.25.1.el9_2.x86_64"
args="ro console=tty0 console=ttyS0,115200n8 no_timer_check net.ifnames=0"
root="UUID=d370e124-ea83-46ea-a7ef-67f12dd8bb3c"
initrd="/boot/initramfs-5.14.0-284.25.1.el9_2.x86_64.img"
title="Red Hat Enterprise Linux (5.14.0-284.25.1.el9_2.x86_64) 9.2 (Plow)"
id="b05a6b63e39f418ab21979742c470d27-5.14.0-284.25.1.el9_2.x86_64"

Stage 2

TASK [community.sap_install.sap_hana_preconfigure : Ensure that the system is updated to the latest patchlevel] *******
    results:
    - 'Installed: kernel-modules-5.14.0-284.117.1.el9_2.x86_64'
    - 'Installed: python3-jinja2-2.11.3-5.el9_2.noarch'
    - 'Installed: kernel-modules-core-5.14.0-284.117.1.el9_2.x86_64'
    - 'Installed: libsoup-2.72.0-8.el9_2.4.x86_64'
    - 'Installed: python3-perf-5.14.0-284.117.1.el9_2.x86_64'
    - 'Installed: libgcrypt-1.10.0-10.el9_2.1.x86_64'
    - 'Installed: webkit2gtk3-jsc-2.48.1-3.el9_2.x86_64'
    - 'Installed: kernel-5.14.0-284.117.1.el9_2.x86_64'
    - 'Installed: kernel-core-5.14.0-284.117.1.el9_2.x86_64'
    - 'Removed: python3-jinja2-2.11.3-4.el9_2.1.noarch'
    - 'Removed: webkit2gtk3-jsc-2.46.6-2.el9_2.x86_64'
    - 'Removed: libsoup-2.72.0-8.el9_2.3.x86_64'
    - 'Removed: python3-perf-5.14.0-284.110.1.el9_2.x86_64'
    - 'Removed: libgcrypt-1.10.0-10.el9_2.x86_64'
[root@rhel-9-2 ~]# grubby --info=ALL

index=0
kernel="/boot/vmlinuz-5.14.0-284.110.1.el9_2.x86_64"
args="ro console=tty0 console=ttyS0,115200n8 no_timer_check net.ifnames=0"
root="UUID=d370e124-ea83-46ea-a7ef-67f12dd8bb3c"
initrd="/boot/initramfs-5.14.0-284.110.1.el9_2.x86_64.img"
title="Red Hat Enterprise Linux (5.14.0-284.110.1.el9_2.x86_64) 9.2 (Plow)"
id="9a8aa6d3d32c63426d70ef1043ac48ec-5.14.0-284.110.1.el9_2.x86_64"

index=1
kernel="/boot/vmlinuz-5.14.0-284.117.1.el9_2.x86_64"
args="ro console=tty0 console=ttyS0,115200n8 no_timer_check net.ifnames=0"
root="UUID=d370e124-ea83-46ea-a7ef-67f12dd8bb3c"
initrd="/boot/initramfs-5.14.0-284.117.1.el9_2.x86_64.img"
title="Red Hat Enterprise Linux (5.14.0-284.117.1.el9_2.x86_64) 9.2 (Plow)"
id="b05a6b63e39f418ab21979742c470d27-5.14.0-284.117.1.el9_2.x86_64"

index=2
kernel="/boot/vmlinuz-5.14.0-284.25.1.el9_2.x86_64"
args="ro console=tty0 console=ttyS0,115200n8 no_timer_check net.ifnames=0 $tuned_params"
root="UUID=d370e124-ea83-46ea-a7ef-67f12dd8bb3c"
initrd="/boot/initramfs-5.14.0-284.25.1.el9_2.x86_64.img $tuned_initrd"
title="Red Hat Enterprise Linux (5.14.0-284.25.1.el9_2.x86_64) 9.2 (Plow)"
id="b05a6b63e39f418ab21979742c470d27-5.14.0-284.25.1.el9_2.x86_64"

index=3
kernel="/boot/vmlinuz-0-rescue-b05a6b63e39f418ab21979742c470d27"
args="ro console=tty0 console=ttyS0,115200n8 no_timer_check net.ifnames=0"
root="UUID=d370e124-ea83-46ea-a7ef-67f12dd8bb3c"
initrd="/boot/initramfs-0-rescue-b05a6b63e39f418ab21979742c470d27.img"
title="Red Hat Enterprise Linux (0-rescue-b05a6b63e39f418ab21979742c470d27) 9.2 (Plow)"
id="b05a6b63e39f418ab21979742c470d27-0-rescue"

Bug reproduction

Install from ISO or a Cloud IaaS provider, use RHEL 9.2 as the lightning rod for this issue.

Community participation

Unfortunately I am not in a position to help with the bug fix

sean-freeman avatar Jun 03 '25 14:06 sean-freeman

Do we want to support RHEL systems on which no package named kernel is installed, or can we use the current behavior of the role to detect such systems? If yes, is anyone aware (and can provide a link) of a documentation available which confirms that RHEL systems without the package kernel are fully supported in SAP environments?

berndfinger avatar Jun 03 '25 14:06 berndfinger

Is there a reason why rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel is preferred to the currently running kernel using uname -r ?

sean-freeman avatar Jun 03 '25 15:06 sean-freeman

Is there a reason why rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" kernel is preferred to the currently running kernel using uname -r ?

I think there was no specific reason but this code turned out to be fulfilling the requirements and was tested extensively. Before changing this code, which triggers additional testing, we need to find out if it is necessary to change this code. There is also an alternative solution available for SLES (which I would prefer) but again, let's first be sure that changing the code is really necessary. Worst case would be that we change the code and afterwards get informed that a RHEL for SAP system without the package kernel is unsupported.

berndfinger avatar Jun 03 '25 15:06 berndfinger

Well, there are 2 parts to this issue:

  1. Current logic is good enough, but an additional check should occur to be very sure that we are not accidentally downgrading the kernel. Could be as simple as uname -r or using the Ansible Facts .ansible_kernel value and trigger an error that the code suggested to replace with the minimum kernel version..... but the running kernel was higher and therefore "ERROR" emitted. This avoids the accidental downgrade of kernel.
  2. Perhaps the update * logic should be altered so that we do not upgrade kernel? Something like a dry-run of update *, parse the packages to update list, and remove kernel from that list?

sean-freeman avatar Jun 03 '25 16:06 sean-freeman