xcat-core icon indicating copy to clipboard operation
xcat-core copied to clipboard

Unable to provision RHEL 8 via Infiniband network using xCAT

Open abhishek-sa1 opened this issue 2 years ago • 4 comments

I am trying provision RHEL 8 via Infiniband network using xCAT. Provisioning is getting stuck after loading OS profile and screenshot mentioned below:

image

Below screenshot confirm server has loaded RHEL 8 OS image: image image

Please find the xcat network configuration below: image

abhishek-sa1 avatar Sep 20 '22 18:09 abhishek-sa1

The xCAT core team does our internal x86 testing using an ethernet provisioning network rather than Infiniband. The problem you are having might be due to the RHEL 8 initrd not including the required Infiniband kernel modules during the diskful install.

Two options to consider: 1.) Install via ethernet instead of Infiniband. 2.) This document: https://hpc.lenovo.com/users/documentation/el8ibinstall.html contains some useful information for installing RHEL 8 over Infiniband.

For your specific situation, I think only these three steps from the link above should be needed:

  • Put Mellanox OFED Driver update media in place
  • Net config fixup postscript
  • xCAT configuration

besawn avatar Sep 20 '22 21:09 besawn

Hi @besawn ,

We want to install OS via Infiniband only. I have tried solution provided. I am still stuck at same error. image

Below error is for SLES 15.3 provisioning: image

abhishek-sa1 avatar Sep 28 '22 05:09 abhishek-sa1

As an fyi, we have been testing and using confluent for infiniband deployment, but here's our documentation for it in xCAT when we used to support that way: https://hpc.lenovo.com/users/documentation/el8ibinstall.html

jjohnson42 avatar Sep 29 '22 12:09 jjohnson42

@abhishek-sa1 2 options for debugging the issue:

  • Fire up the ipmi console, connect to the node with the stuck install, change the console to the 4th console (Ctrl-b 4 - tmux is used to multiplex install consoles in anaconda) to start up a local shell, and check network connectivity of the net-install environment on the node.
  • Unpack the initrd used by the problem console, and confirm that the infiniband drivers and config inside the initrd are correct in /tftpboot/xcat/osimage/rhels8.5.0-x86_64-install-compute/initrd. img

samveen avatar Sep 30 '22 10:09 samveen