xcat-core
xcat-core copied to clipboard
Unable to provision RHEL 8 via Infiniband network using xCAT
I am trying provision RHEL 8 via Infiniband network using xCAT. Provisioning is getting stuck after loading OS profile and screenshot mentioned below:
Below screenshot confirm server has loaded RHEL 8 OS image:
Please find the xcat network configuration below:
The xCAT core team does our internal x86 testing using an ethernet provisioning network rather than Infiniband. The problem you are having might be due to the RHEL 8 initrd not including the required Infiniband kernel modules during the diskful install.
Two options to consider: 1.) Install via ethernet instead of Infiniband. 2.) This document: https://hpc.lenovo.com/users/documentation/el8ibinstall.html contains some useful information for installing RHEL 8 over Infiniband.
For your specific situation, I think only these three steps from the link above should be needed:
- Put Mellanox OFED Driver update media in place
- Net config fixup postscript
- xCAT configuration
Hi @besawn ,
We want to install OS via Infiniband only. I have tried solution provided. I am still stuck at same error.
Below error is for SLES 15.3 provisioning:
As an fyi, we have been testing and using confluent for infiniband deployment, but here's our documentation for it in xCAT when we used to support that way: https://hpc.lenovo.com/users/documentation/el8ibinstall.html
@abhishek-sa1 2 options for debugging the issue:
- Fire up the ipmi console, connect to the node with the stuck install, change the console to the 4th console (Ctrl-b 4 - tmux is used to multiplex install consoles in anaconda) to start up a local shell, and check network connectivity of the net-install environment on the node.
- Unpack the initrd used by the problem console, and confirm that the infiniband drivers and config inside the initrd are correct in /tftpboot/xcat/osimage/rhels8.5.0-x86_64-install-compute/initrd. img