xcat-core icon indicating copy to clipboard operation
xcat-core copied to clipboard

genesis boot failing on Intel E810 NIC cards using xCAT 2.16.4 version

Open abhishek-sa1 opened this issue 1 year ago • 5 comments

genesis boot failing on Intel E810 NIC cards using xCAT 2.16.4 version image

Referring to issue https://github.com/xcat2/xcat-core/issues/7109 understood we have to create a custom genesis image to genesis boot for Intel E810 NIC cards.

I have created genesis-base rpm file by cloning 2.16.4 xCAT code and installed it following steps mentioned in https://github.com/xcat2/xcat-core/tree/2.16.4/xCAT-genesis-builder

With this genesis image also genesis boot failed for Intel E810 NIC cards.

Since arch we are using is x86_64 and OS version RHEL 8.6, I have identified below section of code is not executing for x86_64 architecture. image

I have modified ppc64 to x86_64 locally and created new rpm file for genesis boot and it worked after this change.

I could see instmods ice is adding in installkernel file while creating rpm file as mentioned below: image

Except Intel E810 NIC cards, other NIC cards are booting with modprobe commands mentioned in https://github.com/xcat2/xcat-core/blob/2.16.4/xCAT-genesis-builder/xcat-cmdline.sh.

Below 2 files related to ice i have found out. Still modprobe was mandatory to boot genesis image. image image

Can you help me to understand why modprobe is mandatory for Intel E810 NIC cards and not required for other NIC cards? How drivers are getting loaded while genesis boot?

Also, observed genesis boot failure for NVIDIA ConnectX-6 NIC cards. Does xCAT support NVIDIA ConnectX-6 NIC cards genesis boot?

abhishek-sa1 avatar Aug 10 '23 07:08 abhishek-sa1

@gurevichmark can you help me in this?

abhishek-sa1 avatar Aug 10 '23 16:08 abhishek-sa1

@abhishek-sa1 This is more of a kernel module loading issue vs. genesis or xcat issue.

For fixing the issue please check on the following:

  • Which linux kernel version are you using for building the genesis image?
  • Is the driver for the card compiled into that kernel or is it a separate module file (.ko in /lib/modules/KERNEL_VERSION/ )?

When the kernel loads on the target machine, it initializes all the hardware for which the drivers are built-in. For hardware without built-in drivers, the kernel needs to be informed which drivers to load, usually via a manual insmod (or better modprobe which handles inter-module dependencies), or via config containing the module names to load from

  • /etc/modprobe.d/*.conf
  • /etc/modules-load.d/*.conf

Adding such files into the genesis tree and then rebuilding the image may solve your problem, as the mods are in the genesis tree, from what you have shown above.

Note: Please paste text instead of screenshots: it makes the task of selecting and checking so much easier.

samveen avatar Aug 11 '23 10:08 samveen

@samveen I am using kernel version 4.18.0-372.9.1.el8.x86_64 and xCAT is running on RHEL 8.6 OS with x86_64 architecture.

abhishek-sa1 avatar Aug 14 '23 15:08 abhishek-sa1

dracut-install: ERROR: installing 'dhclient'
dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.PSkuHN/initramfs -a mkswap df ifenslave ssh-keygen scp clear dhclient lldpad
dracut-install: ERROR: installing 'rngd'
dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.PSkuHN/initramfs -a rngd

Noticed above error while buidling rpm in x86_64 architecture and RHEL 8.6 OS.

abhishek-sa1 avatar Aug 14 '23 15:08 abhishek-sa1

@abhishek-sa1 with a quick check on google gave me https://bugzilla.redhat.com/show_bug.cgi?id=2054092

Would you also review the availability of the rngd command on the build system?

samveen avatar Aug 15 '23 21:08 samveen