xcat-core
xcat-core copied to clipboard
Help Request: PXE freezes during node deployment
trafficstars
I'm trying to use xcat to deploy rhel 8.9 onto a compute node, but the compute node fails to finish booting at this point:
Configuring (net0 ac:1f:6b:bc:db:ec)...... ok
net0: 192.168.32.12/255.255.240.0 gw 192.168.47.245
net0: fe80::ae1f:6bff:febc:dbec/64
Next server: 192.168.47.245
Filename: http://192.168.47.245:80/tftpboot/xcat/xnba/nets/192.168.32.0_20.uefi
http://192.168.47.245:80/tftpboot/xcat/xnba/nets/192.168.32.0_20.uefi... ok
192.168.32.0_20.uefi : 304 bytes [script]
http://192.168.47.245:80/tftpboot/xcat/genesis.kernel.x86_64... ok
http://192.168.47.245:80/tftpboot/xcat/genesis.fs.x86_64.gz... ok
- The PXE process starts with the NIC getting an IP address.
- The node retrieves genesis.kernel and genesis.fs.
- At this point, the node freezes, and does not produce any more output.
During this process, I see this on the xcat head node:
[root@xcat_adm ~]# xcatprobe osdeploy -n cn01
The install NIC in current server is ib0 [INFO]
All nodes to be deployed are valid [ OK ]
-------------------------------------------------------------
Start capturing every message during OS provision process....
-------------------------------------------------------------
[cn01] 13:56:23 Receive DHCPDISCOVER via ens2f0
[cn01] 13:56:24 Send DHCPOFFER on 192.168.32.72 back to ac:1f:6b:bc:db:ec via ens2f0
[cn01] 13:56:26 DHCPREQUEST for 192.168.32.72 (192.168.47.245) from ac:1f:6b:bc:db:ec via ens2f0
[cn01] 13:56:26 Send DHCPACK on 192.168.32.72 back to ac:1f:6b:bc:db:ec via ens2f0
[cn01] 13:56:26 Via TFTP download xcat/xnba.efi
[cn01] 13:56:27 Via TFTP download xcat/xnba.efi
[cn01] 13:56:30 Receive DHCPDISCOVER via ens2f0
[cn01] 13:56:31 Send DHCPOFFER on 192.168.32.12 back to ac:1f:6b:bc:db:ec via ens2f0
[cn01] 13:56:31 DHCPREQUEST for 192.168.32.12 (192.168.47.245) from ac:1f:6b:bc:db:ec via ens2f0
[cn01] 13:56:31 Send DHCPACK on 192.168.32.12 back to ac:1f:6b:bc:db:ec via ens2f0
[cn01] 13:56:39 Via HTTP get /tftpboot/xcat/xnba/nets/192.168.32.0_20.uefi
[cn01] 13:56:39 Via HTTP get /tftpboot/xcat/genesis.kernel.x86_64
[cn01] 13:56:39 Via HTTP get /tftpboot/xcat/genesis.fs.x86_64.gz
[cn01] 13:57:23 Receive DHCPDISCOVER via ens2f0
[cn01] 13:57:24 Send DHCPOFFER on 192.168.32.28 back to ac:1f:6b:bc:db:ec via ens2f0
[cn01] 13:57:24 DHCPREQUEST for 192.168.32.28 (192.168.47.245) from ac:1f:6b:bc:db:ec via ens2f0
[cn01] 13:57:24 Send DHCPACK on 192.168.32.28 back to ac:1f:6b:bc:db:ec via ens2f0
I still have a lot to learn about xcat, so I'll be extremely grateful for any and all help that's offered.
Additional information:
[root@xcat_adm ~]# lsdef -t node cn01
Object name: cn01
arch=x86_64
bmc=192.168.36.48
cons=ipmi
consoleenabled=1
currchain=boot
currstate=install rhels8.9.0-x86_64-compute
getmac=ipmi
hostnames=cn01
ip=192.168.84.248
mac=ac:1f:6b:bc:db:ec
mgt=ipmi
netboot=xnba
nicips.ib0=192.168.84.248
nicips.ipmi=192.168.36.48
nicips.eno1=192.168.36.248
nicnetworks.eno1=ipmi-net
nicnetworks.ib0=ib-net
nictypes.eno1=Ethernet
nictypes.ib0=InfiniBand
os=rhels8.9.0
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
profile=compute
provmethod=rhels8.9.0-x86_64-install-compute
serialport=1
serialspeed=115200
status=powering-on
statustime=08-01-2024 13:54:21
[root@xcat_adm ~]# lsdef -t osimage rhels8.9.0-x86_64-install-compute
Object name: rhels8.9.0-x86_64-install-compute
imagetype=linux
osarch=x86_64
osdistroname=rhels8.9.0-x86_64
osname=Linux
osvers=rhels896.0
partitionfile=s:/install/custom/partitionfile/rhels8.9.0-x86_64-install-compute_partitions.sh
pkgdir=/install/rhels8.9.0/x86_64
pkglist=/install/custom/pkglist/rhel8-pkglist-compute.pkglist
postscripts=custom/rhel-8.9-postscript-compute.sh
profile=compute
provmethod=install
template=/install/custom/template/rhels8.9.0-x86_64-install-compute.tmpl