xcat-core
xcat-core copied to clipboard
makedns fails with unable to find an IP when using nics table.
I am running a cluster that only uses DNS on the headnode for name resolution and not /etc/hosts. Following the documentation for defining additional nics using the nics table, I configured IP addresses for the infiniband network.
[root@nvme01 ~]# lsdef nvme02 --nics
Object name: nvme02
nicaliases.ib0=nvme02-ib0
nichostnamesuffixes.ib0=-ib0
nicips.ib0=192.168.100.2
nicnetworks.ib0=infiniband
nictypes.ib0=infiniband
When I try to run makedns nvme02 to update the DNS configuration I get the following error.
Error: [nvme01]: Unable to find an IP for nvme02-ib0 in hosts table or via system lookup (i.e. /etc/hosts)
Since the cluster isn't using /etc/hosts, I have not added the alias to /etc/hosts. I have checked the hosts table and there is nothing additional configured for the alias there.
[root@nvme01 ~]# tabdump hosts
#node,ip,hostnames,otherinterfaces,comments,disable
"nvme02","192.168.0.2",,,,
From reading the documentation I got the impression that using otherinterfaces was deprecated in favour of using the nics table, but this doesn't seem to work with the makedns script. Am I mistaken in believing that makedns should work with entries in the nics table.
This is on the below OS and xCAT versions
[root@nvme01 ~]# cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core)
[root@nvme01 ~]# lsxcatd -a
Version 2.14.5 (git commit fc0fb3fca198aa298a114f6124749275e7d81f8c, built Thu Dec 6 22:20:43 EST 2018)
hi @bybai , would you pls take a look at this issue? thx
hi @paddyoneill , as a workaround , you can add nvme02-ib0 in hosts table like:
#tabdump hosts
"nvme02-ib0","192.168.100.2",,,,
then try makedns nvme02
hi @paddyoneill, I think there are 2 problems you hit.
- The node
nvme02nics definition was confused. You can find my example as following. - Before you
makedns nvme02, you should execute commandmakehosts nvme02first.
Here is my example: 1.
]# lsdef bybc0605 --nics
Object name: bybc0605
nicips.ib0=10.20.100.9
nicips.ib1=10.11.100.9
nicnetworks.ib0=mgtnetwork
nicnetworks.ib1=mgtnetwork
nictypes.ib0=Infiniband
nictypes.ib1=Infiniband
]# makehosts bybc0605
]# cat /etc/hosts|grep bybc0605
10.5.106.5 bybc0605 bybc0605.cluster.com
10.20.100.9 bybc0605-ib0 bybc0605-ib0.cluster.com
10.11.100.9 bybc0605-ib1 bybc0605-ib1.cluster.com
]# makedns bybc0605
Handling bybc0605-ib0 in /etc/hosts.
Handling bybc0605 in /etc/hosts.
Handling bybc0605-ib1 in /etc/hosts.
Getting reverse zones, this may take several minutes for a large cluster.
Completed getting reverse zones.
Updating zones.
Completed updating zones.
Updating DNS records, this may take several minutes for a large cluster.
Completed updating DNS records.
DNS setup is completed
]# nslookup bybc0605-ib0
Server: 10.5.106.2
Address: 10.5.106.2#53
Name: bybc0605-ib0.cluster.com
Address: 10.20.100.9
hi @bybai ,since @paddyoneill "only uses DNS on the headnode for name resolution and not /etc/hosts." If you modify the hosts line in /etc/nsswitch.conf to hosts: dns, makedns will fail even if the entries exists in /etc/hosts. Your test completed successfully because you are still using /etc/hosts to resolve the hostname
Ignore the closing and opening, my mistake.
Thanks @bybai for the suggestions so far, I have changed the nics table to be similar to the example you provided.
As @immarvin mentioned, since this environment is setup to only use DNS and not the /etc/hosts file, makedns still fails to resolve the nvme02-ib0 hostname even after running makehosts first.
did you try this:
as a workaround , you can add
nvme02-ib0in hosts table like:
#tabdump hosts
"nvme02-ib0","192.168.100.2",,,,
then try
makedns nvme02
Hi @paddyoneill and @immarvin,
If you want to use hosts table but not /etc/hosts file, -ib0 should be configured in otherinterfaces, but it is not work well. So the following example can work around:
- create new node named
bybc0605-ib0, this node is only forDNS
]# nslookup nvme02-ib0
Server: 10.5.106.2
Address: 10.5.106.2#53
** server can't find nvme02-ib0: NXDOMAIN
]# chdef nvme02-ib0 ip=10.60.100.9 groups=all
1 object definitions have been created or modified.
New object definitions 'nvme02-ib0' have been created.
[root@bybc0602 ~]# lsdef nvme02-ib0
Object name: nvme02-ib0
groups=all
ip=10.60.100.9
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
]# makedns nvme02-ib0
Handling nvme02-ib0 in /etc/hosts.
Getting reverse zones, this may take several minutes for a large cluster.
Completed getting reverse zones.
Updating zones.
Completed updating zones.
Updating DNS records, this may take several minutes for a large cluster.
Completed updating DNS records.
DNS setup is completed
]# nslookup nvme02-ib0
Server: 10.5.106.2
Address: 10.5.106.2#53
Name: nvme02-ib0.cluster.com
Address: 10.60.100.9
hi @bybai, the proposed workaround works, but is also means that each node needs a separate definition for each interface it has, so it would become cumbersome to manage at scale.
I will try to use the otherintefaces option to see if it works and let you know.
@paddyoneill, thanks your feedback, since the workaround works, it is not a block issue, will plan it in 2.15.
Hi guys !! I would like to know if this problem has been solved because it is still present in xcat 2.16.4
hi @bybai, the proposed workaround works, but is also means that each node needs a separate definition for each interface it has, so it would become cumbersome to manage at scale.
I will try to use the
otherintefacesoption to see if it works and let you know.
Hi please, confirm if the problem is solved because I am still facing the same problem with the xcat version:
lsxcatd -a Version 2.16.4 (git commit bb7a4bbbc8bde7e6613558d8d039fe43d49d2079, built Mon Jun 13 08:53:10 EDT 2022) This is a Management Node dbengine=SQLite