xcat-core
xcat-core copied to clipboard
Diskless node deployment with different VLANs. TFTP file not found
Hi everyone. We are trying to do a diskless deployment through xCAT 2.16. We have several racks and each one has a subnet. Our custom RHEL8 images are ready and worked in our test environment with a single VLAN. When we try to replicate the network but including different subnets, it does not start. We have tried modifying /etc/dhcpd.conf manually: nodes get their IP, but TFTP fails.
How can we fix this? We are trying to upgrade from RHEL7 Perceus to RHEL8 xCAT.
Are you using service nodes or only a management node?
Can you provide this output from your management node:
lsdef -t network -l
tabdump site | grep dhcpinterfaces
ip route
Thanks for the quick response.
Our intention is to deploy all diskless servers with different custom images, using netboot and mounting shared folders.
Here you have more information about our deployment:

We don't really need dynamic range because we want static IPs. It wasn't working so we modified dhcpd.conf and added the host by hand:
/etc/dhcp/dhcpd.conf
/var/log/messages
Any suggestions or help would be very much appreciated, we are a bit blocked.
Thank you for you time.
Add that when we indicate 172.17.0.1, the switch forwards to the main server, in this case 172.17.31.1. We have tried both IPs just in case, but it does not seem to be a connectivity problem.
I would recommend not modifying the dhcpd.conf by hand.
We don't really need dynamic range because we want static IPs.
You don't need a dynamic range to boot a node with a diskless image, but you do need to have the mac address/IP address association configured for that node. During normal operation, xCAT adds the entry for the host to the DHCP configuration when you run makedhcp cons0201.
There is a discrepancy between your network table configuration and the actual network interface configuration on your management node. Specifically, the netmask for the management_21_network is 255.255.255.0 and your bond0 netmask is 255.255.0.0. You should correct your network table entries so they match your actual management node network interfaces. I would suggest trying to get things working with a flat network configuration first. If you can get that to work, you can go back and implement a more sophisticated network scheme. To correct your network table entries, you can use makenetworks, chdef, or tabedit networks. Whenever you make changes to your network table entries, you need to re-run makedhcp -n to add the updates to the DHCP server configuration on the management node.
If you are confident that the mac and ip attributes are correct for cons0201, I would suggest you try the following:
Start by saving your existing dhcpd.conf in case you need to preserve any of your manual changes.
# Re-generate a fresh DHCP configuration using
makedhcp -n
# Add your node mac address / IP information to your DHCP configuration
makedhcp cons0201
# Check that the configuration matches what you expect
makedhcp -q cons0201
# Try to boot the node with the diskless image
nodeset cons0201 osimage=rhels8.4.0-x86_64-cons-compute
rpower cons0201 boot
# Once the rpower starts, you can further debug by watching the boot process in the console
rcons cons0201
If you are still experiencing problems, try xcatprobe to see if it detects any issues.
xcatprobe xcatmn will check for configuration problems on the management node.
xcatprobe osimagecheck will check for issues with your osimages.
xcatprobe osdeploy -n cons0201 will allow you to monitor the node install process while a node is booting to look for problems.