xcat-core
xcat-core copied to clipboard
install of RHEL8.2.0 fails due to missing environment variables MASTER_IP, MASTER in xcatinstall
the xcatinstall postscript is not filling in the vars MASTER_IP and MASTER when provisioning RHEL8 which starts with post.xcat.ng instead of post.xcat (compared to RHEL7) MASTER_IP is e.g. required for the logger inside msgutil_r MASTER is required for the updateflag.awk script
- Error noticed in xcat.log "the network between the node and $MASTER_IP is not ready, please check" in which the message has no MASTER_IP
I added this code to find out if the value is available and could be added: Yes.
if [ "$MASTER_IP" = "" ];
then
buf="#ENV:MASTER_IP#"
if [ "$buf" != "" ]; then
msg="Setting MASTER_IP from xCAT env ENV:MASTER_IP"
else
buf="172.16.16.1"
msg="Setting MASTER_IP to $buf"
fi
export MASTER_IP=$buf
echo "xcatinstallpost $msg" >> /root/post.xcat.log
fi
RETRY=0
while true; do
#check whether the network access between MN/CN and the node is ready
if ping $MASTER_IP -c 1 >/dev/null ; then
echo "xcatinstallpost [$RETRY/90] Ping response from $MASTER_IP" >> /root/post.xcat.log
break
else
echo "xcatinstallpost [$RETRY/90] No ping response from $MASTER_IP yet" >> /root/post.xcat.log
fi
RETRY=$[ $RETRY + 1 ]
if [ $RETRY -eq 90 ];then
#timeout, complain and exit
msgutil_r "$MASTER_IP" "error" "the network between the node and $MASTER_IP is not ready, please check[retry=$RETRY]..." "/var/log/xcat/xcat.log" "$log_label"
exit 1
fi
#sleep sometime before the next scan
sleep 2
done
Running with the above code (and more) added will trigger this output
[root@netsres46 ~]# cat /root/post.xcat.log
...
xcatinstallpost Setting NODE to netsres46 from xCAT env TABLE:nodelist:THISNODE:node
- Error, the final installstatus update fails because MASTER is empty
flag update failed
or
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
updateflag.awk: Retrying flag update
This code added to xcatinstallpost shows that the value for MASTER is missing as well:
if [ "$MASTER" = "" ];
then
buf="#XCATVAR:XCATMASTER#"
if [ "$buf" != "" ]; then
msg="Setting MASTER from xCAT env XCATVAR:XCATMASTER"
else
buf="172.16.16.1"
msg="Setting MASTER to $buf"
fi
export MASTER=$buf
echo "xcatinstallpost $msg" >> /root/post.xcat.log
fi
Running with the above code (and more) added will trigger this output
...
xcatinstallpost Setting MASTER from xCAT env XCATVAR:XCATMASTER <<< Error 2 prevention
...
did u define master
in the site
table?
Can u run xcatprobe xcatmn -i <provision network interface>
?
Yes, master is defined:
[root@netsres-xcat ~]# tabdump site | grep -i master
"master","172.16.16.1",,
Output from xcatprobe:
[root@netsres-xcat ~]# xcatprobe xcatmn -i eth1
[mn]: Checking all xCAT daemons are running... [ OK ]
[mn]: Checking xcatd can receive command request... [ OK ]
[mn]: Checking 'site' table is configured... [ OK ]
[mn]: Checking provision network is configured... [ OK ]
[mn]: Checking 'passwd' table is configured... [ OK ]
[mn]: Checking important directories(installdir,tftpdir) are configured... [ OK ]
[mn]: Checking SELinux is disabled... [ OK ]
[mn]: Checking HTTP service is configured... [ OK ]
[mn]: Checking TFTP service is configured... [ OK ]
[mn]: Checking DNS service is configured... [ OK ]
[mn]: Checking DHCP service is configured... [ OK ]
[mn]: Checking NTP service is configured... [ OK ]
[mn]: Checking rsyslog service is configured... [ OK ]
[mn]: Checking firewall is disabled... [ OK ]
[mn]: Checking minimum disk space for xCAT ['/var' needs 1GB;'/install' needs 10GB;'/tmp' needs 1GB]... [ OK ]
[mn]: Checking Linux ulimits configuration... [ OK ]
[mn]: Checking network kernel parameter configuration... [ OK ]
[mn]: Checking xCAT daemon attributes configuration... [ OK ]
[mn]: Checking xCAT log is stored in /var/log/xcat/cluster.log... [WARN]
[mn]: Failed to store MN logs to /var/log/xcat/cluster.log
[mn]: Checking xCAT management node IP: <172.16.16.1> is configured to static... [ OK ]
[mn]: Checking dhcpd.leases file is less than 100M... [ OK ]
=================================== SUMMARY ====================================
[MN]: Checking on MN... [ OK ]
Checking xCAT log is stored in /var/log/xcat/cluster.log... [WARN]
Failed to store MN logs to /var/log/xcat/cluster.log
I would like to add, that any other previous RHEL distro still installs fine just not RHEL8.2.0 I noticed that the mypostscript.tmpl does not seem to appear being used when I run
rinstall <node> osimage=rhels8.2.0-x86_64-install-netsres
(precreatemypostscript is not enabled) During the post tasks I see that the section in mypostscript starting with (when installing RHEL7.7
AUDITNOSYSLOG='0'
export AUDITNOSYSLOG
XCATCONFDIR='/etc/xcat'
export XCATCONFDIR
TFTPDIR='/tftpboot'
export TFTPDIR
PPCMAXP='64'
export PPCMAXP
...
ending with
export SNMPPRIV
SNMPAUTH=''
export SNMPAUTH
# postscripts-start-here
# defaults-postscripts-start-here
syslog
remoteshell
syncfiles
# defaults-postscripts-end-here
# osimage-postscripts-start-here
custom/rhels7.7-x86_64-install-netsres/compute.postinstall
# osimage-postscripts-end-here
# node-postscripts-start-here
confignetwork
setroute
# node-postscripts-end-here
# postscripts-end-here
# postbootscripts-start-here
# osimage-postbootscripts-start-here
custom/rhels7.7-x86_64-install-netsres/compute.postboot
# osimage-postbootscripts-end-here
# node-postbootscripts-start-here
syncfiles
console-rev.sh
net-peer-disable.sh
# node-postbootscripts-end-here
# postbootscripts-end-here
Is not included when installing RHEL8.2 which explains why no post scripts are run, no ssh config no variables are known
Here is logic to determine the MASTER_IP
#the logic to determine the $ENV{XCATMASTER} confirm to the following priority(from high to low):
## 1, the "xcatmaster" attribute of the node
## 2, the ip address of the mn/sn facing the compute node
## 3, the site.master
check the node definition, is xcatmaster
defined?
or run the command: nslookup <nodename>
to make sure ip address can be resolved.
maybe you can show me the lsdef <nodename>
and tabdump networks
.
None of my nodes has xcatmaster defined:
netsres01: xcatmaster=
netsres02: xcatmaster=
netsres03: xcatmaster=
netsres04: xcatmaster=
netsres05: xcatmaster=
netsres06: xcatmaster=
netsres07: xcatmaster=
netsres08: xcatmaster=
netsres09: xcatmaster=
netsres10: xcatmaster=
netsres11: xcatmaster=
netsres12: xcatmaster=
netsres13: xcatmaster=
netsres14: xcatmaster=
netsres15: xcatmaster=
netsres16: xcatmaster=
netsres42: xcatmaster=
netsres42-vm1: xcatmaster=
netsres43: xcatmaster=
netsres44: xcatmaster=
netsres48: xcatmaster=
netsres49: xcatmaster=
netsres50: xcatmaster=
netsres51: xcatmaster=
netsres52: xcatmaster=
netsres54: xcatmaster=
netsres55: xcatmaster=
netsres56: xcatmaster=
netsres57: xcatmaster=
netsres58: xcatmaster=
netsres59: xcatmaster=
netsres60: xcatmaster=
netsres61: xcatmaster=
netsres62: xcatmaster=
netsres63: xcatmaster=
netsres74: xcatmaster=
netsres75: xcatmaster=
netsres76: xcatmaster=
netsres77: xcatmaster=
netsres78: xcatmaster=
netsres79: xcatmaster=
netsres80: xcatmaster=
netsres81: xcatmaster=
netsres82: xcatmaster=
netsres83: xcatmaster=
netsres84: xcatmaster=
netsres85: xcatmaster=
netsres86: xcatmaster=
Must I define this attribute?
-
What do you mean with "the ip address of the mn/sn facing the compute node"? I have one interface 172.16.16.0/20 for all nodes. xcatmaster is 172.16.16.1 and each node has route to it.
[root@netsres-xcat ~]# tabdump site | grep master "master","172.16.16.1",,
nodedef:
[root@netsres-xcat ~]# lsdef netsres46
Object name: netsres46
addkcmdline=inst.sshd kernel.watchdog_thresh=30
arch=x86_64
cons=ipmi
currchain=boot
currstate=install rhels8.2.0-x86_64-netsres
groups=all,vm
ip=172.16.17.46
mac=52:54:00:4b:2e:38
mgt=kvm
netboot=xnba
nicdevices.br_blue=ens4
nicdevices.br_green=ens3
nichostnamesuffixes.br_blue=-blu
nichostnamesuffixes.br_green=-gre
nicips.ens3=172.16.17.46
nicips.br_blue=9.2.156.70
nicips.br_green=172.16.17.46
nicnetworks.br_blue=blue
nicnetworks.br_green=green
nicnetworks.enp1s0f0=green
nictypes.br_blue=bridge
nictypes.ens3=ethernet
nictypes.ens4=ethernet
nictypes.enp1s0f0=ethernet
nictypes.br_green=bridge
os=rhels8.2.0
postbootscripts=syncfiles,console-rev.sh,net-peer-disable.sh
postscripts=syslog,remoteshell,syncfiles,confignetwork,setroute
power=ipmi
profile=netsres
provmethod=rhels8.2.0-x86_64-install-netsres
routenames=pubrt,greenrt
serialport=0
serialspeed=115200
status=installing
statustime=09-28-2020 16:14:07
updatestatus=failed
updatestatustime=09-26-2020 20:23:09
Networks table:
[root@netsres-xcat ~]# tabdump networks
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,mtu,comments,disable
"blue","9.2.156.64","255.255.255.192","eth2","9.2.156.65",,"9.2.156.71","9.2.250.86",,,,,,,,,,"1500",,
"green","172.16.16.0","255.255.240.0","eth1","<xcatmaster>",,"172.16.16.1",,,,"172.16.28.1-172.16.31.254",,,,,,,"1500",,
"nickel","172.16.80.0","255.255.240.0","lo0","172.16.80.1",,,,,,,,,,,,,"9000",,
"purple","172.16.32.0","255.255.240.0","eth3","172.16.32.1",,"172.16.32.1",,,,"172.16.44.1-172.16.47.254",,,,,,,"1500",,
"red","172.16.0.0","255.255.240.0","eth0","172.16.0.1",,"172.16.0.1",,,,"172.16.12.1-172.16.15.254",,,,,,,"1500",,
"silver","172.16.96.0","255.255.240.0","lo0","172.16.96.1",,,,,,,,,,,,,"9000",,
"zinc","172.16.48.0","255.255.240.0","lo0","172.16.48.1",,,,,,,,,,,,,"9000",,
"cadmium","172.16.176.0","255.255.240.0","lo0","172.16.176.1",,,,,,,,,,,,,"9000",,
"copper","172.16.144.0","255.255.240.0","lo0","172.16.144.1",,,,,,,,,,,,,"9000",,
"chromium","172.16.160.0","255.255.240.0","lo0","172.16.160.1",,,,,,,,,,,,,"9000",,
"titanium","172.16.192.0","255.255.240.0","lo0","172.16.192.1",,,,,,,,,,,,,"9000",,
"tungsten","172.16.208.0","255.255.240.0","lo0","172.16.208.1",,,,,,,,,,,,,"9000",,
"tantalum","172.16.224.0","255.255.240.0","lo0","172.16.224.1",,,,,,,,,,,,,"9000",,
"gold","172.16.240.0","255.255.240.0","lo0","172.16.240.1",,,,,,,,,,,,,"9000",,
"platinum","172.17.16.0","255.255.240.0","lo0","172.17.16.1",,,,,,,,,,,,,"9000",,
"mercury","172.17.32.0","255.255.240.0","lo0","172.17.32.1",,,,,,,,,,,,,"9000",,
"iridium","172.17.0.0","255.255.240.0","lo0","172.17.0.1",,,,,,,,,,,,,"9000",,
"iron","172.16.64.0","255.255.240.0","lo0","172.16.64.1",,,,,,,,,,,,,"9000",,
"cobalt","172.16.112.0","255.255.240.0","lo0","172.16.112.1",,,,,,,,,,,,,"9000",,
"manganese","172.16.128.0","255.255.240.0","lo0","172.16.128.1",,,,,,,,,,,,,"9000",,
"554","9.2.154.128","255.255.255.192","eth2","9.2.154.130",,"9.2.154.140","9.2.250.86",,,,,,,,,,"1500",,
"192_168_122_0-255_255_255_0","192.168.122.0","255.255.255.0","virbr0","<xcatmaster>",,"<xcatmaster>",,,,,,,,,,,"1500",,
Why would RHEl 7.7 install properly with all vars and postboot/postscripts included after firstboot in /xcatpost/mypostscript but not RHEL8*
can u show me the lsdef -t osimage rhels8.2.0-x86_64-install-netsres
?
[root@netsres-xcat ~]# lsdef -t osimage rhels8.2.0-x86_64-install-netsres
Object name: rhels8.2.0-x86_64-install-netsres
imagetype=linux
osarch=x86_64
osdistroname=rhels8.2.0-x86_64
osname=Linux
osvers=rhels8.2.0
otherpkglist=/install/custom/rhels8.2.0-x86_64-install-netsres/pkglist-other
pkgdir=/install/rhels8.2.0/x86_64
pkglist=/install/custom/rhels8.2.0-x86_64-install-netsres/pkglist
postbootscripts=custom/rhels8.2.0-x86_64-install-netsres/compute.postboot
postscripts=custom/rhels8.2.0-x86_64-install-netsres/compute.postinstall
profile=netsres
provmethod=install
synclists=/install/custom/rhels8.2.0-x86_64-install-netsres/synclist
template=/install/custom/rhels8.2.0-x86_64-install-netsres/compute.rhels8.tmpl
everything looks fine to me. If post.xcat.ng
doesn't have MASTER_IP
set, the /opt/xcat/share/xcat/install/scripts/pre.rhels8
should not have neither.
Can u check /install/autoinstall/<nodename>
file? it will created after rinstall
command, the MASTER_IP
should be there already.
Also, after issue rinstall
command, run xcatprobe osdeploy -n <nodename>
,
The file /install/autoins/
[root@netsres-xcat ~]# grep MASTER_IP /install/autoinst/netsres46|head
export MASTER_IP="172.16.16.1"
msgutil_r "$MASTER_IP" "info" "============deployment starting============" "/var/log/xcat/xcat.log" "$log_label"
msgutil_r "$MASTER_IP" "info" "Running Anaconda Pre-Installation script..." "/var/log/xcat/xcat.log" "$log_label"
msgutil_r "$MASTER_IP" "info" "Detecting install disk..." "/var/log/xcat/xcat.log" "$log_label"
msgutil_r "$MASTER_IP" "info" "Found $instdisk, generate partition file..." "/var/log/xcat/xcat.log" "$log_label"
msgutil_r "$MASTER_IP" "info" "Generate the repository for the installation" "/var/log/xcat/xcat.log" "$log_label"
I am captuing xcatprobe osdeploy -n
I notice that the curl command below does not find the mypostscript.
[root@netsres-xcat ~]# lsdef -t site -i precreatemypostscripts
Object name: clustersite
precreatemypostscripts=0
- Which step provides /tftpboot/mypostscripts/mypostscript.$NODE ?
- Must I set precreatemypostscripts to yes|1 ?
curl --fail --retry 20 --max-time 60 "http://$MASTER_IP:${HTTPPORT}$TFTPDIR/mypostscripts/mypostscript.$NODE" -o "/xcatpost/\
mypostscript.$NODE" 2> /tmp/download.log
Error shows as:
[root@netsres46 ~]# cat /tmp/download.log
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (22) The requested URL returned error: 404 Not Found
The mypostscrip.post after first reboot shows that the sections with export VARS and the postscripts label section which appear in other nodes when installing RHEL7.7:
# postscripts-start-here
......
# postscripts-end-here
are both missing. Therefore no postscripts were executed
This is what I see on the node
[root@netsres46 ~]# cat /xcatpost/mypostscript.post
. /xcatpost/xcatlib.sh
# global value to store the running status of the postbootscripts,the value is non-zero if one postbootscript failed
return_value=0
# subroutine used to run postscripts
# $1 argument is the script type
# rest argument is the script name and arguments
run_ps () {
local ret_local=0
mkdir -p "/var/log/xcat"
# On some Linux distro, the rsyslogd daemon write log files with permision
# other than root:root. And in some case, the directory /var/log/xcat was
# created by xCAT, and had root:root ownership. In this way, rsyslogd
# did not have enough permission to write to log files under this directory.
# As a dirty hack, change the ownership of directory /var/log/xcat to the
# same ownership of directory /var/log.
chown root:root "/var/log/xcat"
local logfile="/var/log/xcat/xcat.log"
local scriptype=$1
shift;
if [ -z "$scriptype" ]; then
scriptype="postscript"
fi
log_label="xcat.deployment."$scriptype
if [ -f $1 ]; then
msgutil_r "$MASTER_IP" "info" "Running $scriptype: $1" "$logfile" "$log_label"
if [ "$XCATDEBUGMODE" = "1" ] || [ "$XCATDEBUGMODE" = "2" ]; then
local compt=$(file $1)
local reg="shell script"
if [[ "$compt" =~ $reg ]]; then
bash -x ./$@ 2>&1
ret_local=$?
else
./$@ 2>&1 | logger -t xcat -p debug
ret_local=${PIPESTATUS[0]}
fi
else
./$@ 2>&1
ret_local=${PIPESTATUS[0]}
fi
if [ "$ret_local" -ne "0" ]; then
return_value=$ret_local
fi
msgutil_r "$MASTER_IP" "info" "$scriptype $1 return with $ret_local" "$logfile" "$log_label"
else
msgutil_r "$MASTER_IP" "error" "$scriptype $1 does NOT exist." "$logfile" "$log_label"
return_value=-1
fi
return 0
}
# subroutine end
echo xcat.deployment [xcatinstallpost] mypostscript.post MASTER_IP=$MASTER_IP XCATDEBUGMODE=0 MASTER=$MASTER >> /root/post.xcat.log
[ -f /opt/xcat/xcatinfo ] && grep 'POSTSCRIPTS_RC=1' /opt/xcat/xcatinfo >/dev/null 2>&1 && return_value=1
env > /root/env.mypostscript.post
set -x
if [ "$return_value" -eq "0" ]; then
if [ "$XCATDEBUGMODE" = "1" ] || [ "$XCATDEBUGMODE" = "2" ]; then
msgutil_r "$MASTER_IP" "debug" "node booted, reporting status..." "/var/log/xcat/xcat.log" "$log_label"
fi
updateflag.awk $MASTER 3002 "installstatus booted"
rc=$?
echo "xcat.deployment [xcatinstallpost] mypostscript.post updateflag.awk $MASTER 3002 \"installstatus booted\" return with $rc" >> /root/post.xcat.log
msgutil_r $MASTER_IP "info" "provision completed.($NODE)" "/var/log/xcat/xcat.log" "$log_label"
else
if [ "$XCATDEBUGMODE" = "1" ] || [ "$XCATDEBUGMODE" = "2" ]; then
msgutil_r "$MASTER_IP" "debug" "node boot failed, reporting status..." "/var/log/xcat/xcat.log" "$log_label"
fi
updateflag.awk $MASTER 3002 "installstatus failed"
rc=$?
echo "xcat.deployment [xcatinstallpost] mypostscript.post updateflag.awk $MASTER 3002 \"installstatus failed\" return with $rc" >> /root/post.xcat.log
msgutil_r $MASTER_IP "error" "provision completed with error.($NODE)" "/var/log/xcat/xcat.log" "$log_label"
fi
@dombrowa , sorry, was typo, the file /install/autoinst/<nodename>
is created via nodeset/rinstall
command. It contains deployment flow for this node. The MASTER_IP
was available. For the postscripts defined in the osimage, I think they should have /install
in the front of custom
, right?
postbootscripts=custom/rhels8.2.0-x86_64-install-netsres/compute.postboot
postscripts=custom/rhels8.2.0-x86_64-install-netsres/compute.postinstall
if precreatemypostscripts
is set to yes/1, it will regenerate the /tftpboot/mypostscript/mypostscript.<nodename>
. Normally, we didn't set, the curl will fail if download
curl --fail --retry 20 --max-time 60 "http://$MASTER_IP:${HTTPPORT}$TFTPDIR/mypostscripts/mypostscript.$NODE" -o "/xcatpost/\
mypostscript.$NODE" 2> /tmp/download.log
but it will go on, and use /xcatpost/getpostscript.awk
to download the postscripts.
so, check the file /install/autoinst/<nodename>
to see if MASTER
is unset somewhere. and run xcatprobe osdeploy -n <nodename>
after rinstall
command
Never mind the typo autoinst[all]. It was obvious as I have other xCAT Management nodes to compare to.
As to the curl and awk download: I have added various taps into the code and have observed:
- curl only downloads a single script /tftpboot/mypostscript/mypostscript.
if available. If it does not find it it will silently emit its error into /tmp/download.log and continue. xcat will not report it as a failure - The /xcatpost/getpostscript.awk kicks in if curl failed. I looked at the code which opens a connection to the xcatmaster. This script returns rc=0 if /tftpboot/mypostscript/mypostscript.
exists or not. In my case both methods do not download the file as it has not been generated, but the awk command will create and empty file instead. Why is that file not generated under /tftpboot in RHEL8 but in RHEL7. I do not mind putting a debug/sensors into the xcat perl sources but have not found the code which is responsible for creating /tftpboot/mypostscript/mypostscript. upon rinstall. I tried to locate code as such:
[root@netsres-xcat work]# find /opt/xcat/ -iname "*.pm" -or -iname '*.pl' -exec grep mypostscript {} \;
The autoinst file does contain the MASTER_IP when I run rinstall RHEL8
[root@netsres-xcat ~]# grep MASTER /install/autoinst/netsres46
export MASTER_IP="172.16.16.1"
The problem remains that even with MASTER_IP, MASTER etc. set the mypostscript.post is missing the export statments and the section to run postboot/postscripts.
awk will always create the file no matter what due to the '>'
/xcatpost/getpostscript.awk | egrep '<data>' | sed -e 's/<[^>]*>//g' | egrep -v '^ *$' | sed -e 's/^ *//' | sed -e 's/&l\
t;/</g' -e 's/>/>/g' -e 's/&/\&/g' -e 's/"/"/g' -e "s/'/'/g" >/xcatpost/mypostscript
Instead post.xcat.ng greps for MASTER= in /xcatpost/mypostscript.netsres46 which it never finds in all 10 iterations I added some code which logs this behavior below: curl failed, and awk tries 10x to download
xcat.deployment [post.xcat.ng] curl --fail --retry 20 --max-time 60 "http://172.16.16.1:80/tftpboot/mypostscripts/mypostscript.netsres46" -o "/xcatpost/mypostscript.netsres46" 2> /tmp/download.log return with 22
xcat.deployment [post.xcat.ng] precreated mypostscript not downloaded, see /tmp/download.log
xcat.deployment [post.xcat.ng] no pre-generated mypostscript.<nodename>, trying to get it with getpostscript.awk...
xcat.deployment [post.xcat.ng] /xcatpost/getpostscript.awk .. return with 0
xcat.deployment [post.xcat.ng] [1/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [2/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [3/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [4/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [5/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [6/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [7/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [8/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] [9/10] precreated mypostscript exists
xcat.deployment [post.xcat.ng] Missing MASTER in /xcatpost/mypostscript
xcat.deployment [post.xcat.ng] Missing tag "postscripts-start-here" in /xcatpost/mypostscript
xcat.deployment [post.xcat.ng] Missing tag "postscripts-end-here" in /xcatpost/mypostscript
xcat.deployment [post.xcat.ng] Missing tag "postbootscript-start-here" in /xcatpost/mypostscript
xcat.deployment [post.xcat.ng] Missing tag "postbootscript-end-here" in /xcatpost/mypostscript
xcat.deployment [post.xcat.ng] generate mypostscript.post file successfully
I will run another install using the full path for the post*
postbootscripts=/install/postscripts/custom/rhels8.2.0-x86_64-install-netsres/compute.postboot
postscripts=/install/postscripts/custom/rhels8.2.0-x86_64-install-netsres/compute.postinstall
/xcatpost/getpostscript.awk
will call /opt/xcat/lib/xcat/plugins/getpostscript.pm
, then call makescript
in the file /opt/xcat/lib/perl/xCAT/Postage.pm
can u check the error message of makescript
in the /var/log/xcat/*log
?
In the site table , there is no precreatemypostscripts
attributes, right?
do u have /install/postscripts/mypostscript.tmpl
file available on your system? if you do, can u get rid of it and set precreatemypostscripts
attribute to 0 in the site
table.
MN=Management Node (in my case the MASTER or xCAT server, all as one node)
-
/opt/xcat/lib/xcat/plugins/getpostscript.pm does not exist on the MN or any of my cluster nodes but as /opt/xcat/lib/perl/xCAT_plugin/getpostscript.pm on the MN
-
/install/postscripts/mypostscript.tmpl exists on my MN as /opt/xcat/share/xcat/mypostscript/mypostscript.tmpl and
[root@netsres-xcat ~]# lsdef -t site -i precreatemypostscripts
Object name: clustersite
precreatemypostscripts=0
which should satisfy your requirements. This has not been changed.
With these settings the osdeploy log shows:
[root@netsres-xcat Downloads]# xcatprobe osdeploy -n netsres46 2>&1| tee ~/work/netsres46.osdeploy.2
....
[netsres46] 12:49:02 Via HTTP get /install/postscripts/xcatserver
[netsres46] 12:49:02 Via HTTP get /tftpboot/mypostscripts/mypostscript....
[netsres46] 12:51:32 Via HTTP get /tftpboot/xcat/xnba/nodes/netsres46
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:43 Via HTTP get //install/rhels8.2.0/x86_64/AppStream...
[netsres46] 13:02:44 Via HTTP get //install/rhels8.2.0/x86_64/BaseOS/re...
[netsres46] 13:02:44 Via HTTP get //install/rhels8.2.0/x86_64/BaseOS/re...
[netsres46] 13:02:44 Via HTTP get //install/rhels8.2.0/x86_64/BaseOS/re...
[netsres46] 13:02:44 Via HTTP get //install/rhels8.2.0/x86_64/BaseOS/re...
[netsres46] 13:02:44 Via HTTP get //install/rhels8.2.0/x86_64/BaseOS/re...
60 minutes have expired, stop monitoring [INFO]
====================== Summary =====================
There is 1 node provision failures
netsres46 : stop at stage 'start_to_install_os_package' [FAIL]
and syslog shows:
...
Sep 30 16:32:23 netsres46 xcat.deployment Generate the repository for the installation
Sep 30 12:37:53 netsres46 xcat.deployment [post.xcat.ng] Executing post.xcat to prepare for firstbooting ...
Sep 30 12:38:33 netsres46 xcat.deployment [post.xcat.ng] trying to download postscripts from 172.16.16.1...
Sep 30 12:38:35 netsres46 xcat.deployment [post.xcat.ng] postscripts downloaded successfully
Sep 30 12:38:35 netsres46 xcat.deployment [post.xcat.ng] trying to get mypostscript from 172.16.16.1...
Sep 30 12:38:35 netsres46 xcat.deployment [post.xcat.ng] failed to download precreated mypostscript
Sep 30 12:40:53 netsres46 xcat.deployment [post.xcat.ng] finished firstboot preparation, sending request to 172.16.16.1:3002 for changing status...
Sep 30 12:41:57 netsres46 xcat.deployment [xcatinstallpost] Running /xcatpost/mypostscript.post
Sep 30 12:41:57 netsres46 xcat provision completed.(netsres46)
Sep 30 12:41:57 netsres46 xcat.deployment [xcatinstallpost] /xcatpost/mypostscript.post return
Sep 30 12:41:57 netsres46 xcat.deployment [xcatinstallpost] =============deployment ending====================
sorry, it is /opt/xcat/lib/perl/xCAT_plugin/getpostscript.pm
and /opt/xcat/lib/perl/xCAT/Postage.pm
what's in the /install/postscripts/mypostscript.tmpl
? this file is created if precreatemypostscripts=1
, what's the timestamp?
Can you remove the file /install/postscripts/mypostscript.tmpl
then run rinstall
again?
The timestamp between xcatprobe
command and syslog
is different, and syslog showed deployment ending
, but osdeploy
stuck on the installation of packages?
-
I see that the getpostscript.awk submits
<command>getpostscript</command>"
upon (I assume) the xcat server runs /opt/xcat/lib/perl/xCAT_plugin/getpostscript.pm -
As to the syntax for postscripts: When I run with full path in osimage table for post*scripts, I see this error
netsres46: Wed Sep 30 14:22:39 EDT 2020 postscript /install/postscripts/custom/rhels8.2.0-x86_64-install-netsres/compute.postboot does NOT exist.
Since this message is prefixed with <nodename>
I believe the full path is incorrect and should remain relative
custom/rhels8.2.0-x86_64-install-netsres/compute.postboot
e.g. to /xcatpost on the node
3 . Setting 'precreatemypostscripts=1 and the running rinstall this file appears:
/tftpboot/mypostscripts/mypostscript.netsres46`
Here its contents:
mypostscript.netsres46.gz
but not this anymore:
[root@netsres-xcat ~]# ls /install/postscripts/mypostscript.tmpl
ls: cannot access /install/postscripts/mypostscript.tmpl: No such file or directory
When I switch back to precreatemypostscripts=0
this file /tftpboot/mypostscripts/mypostscript.netsres46
disappears
So it is not clear what and when this file /install/postscripts/mypostscript.tmpl
I will have to check regarding the timestamp as both should be from the MN, correct and in sync regardless if the node has a time offset due to incorrect ntp?
Both MASTER
and MASTER_IP
are defined in the mypostscript.netsres46.gz
.
I think from previous post the MASTER
is also available in the install/autoinst/netsres46
are there some postscripts unset
the ENV?
@dombrowa , what's the OS for xCATmn?
The xcat MN is a VM running
cat[root@netsres-xcat ~]# cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.9 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.9"
PRETTY_NAME="OpenShift Enterprise"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
FYI, I have encountered two items that cause similar behavior: -Ubuntu install - If the installed image has non-gawk awk, then it turns out like this -If site table contains xcatsslversion or xcatsslciphers that disable newer ciphers by mistake, this happens
Note that I delete the use of 'nice' as randbytes, as it is a bad idea.
In this case, running 'getpostscript.awk' is the most direct way of seeing what is going awry.
Thanks @jjohnson42, I had the same problem with xCAT 2.16.1 when deploying CentOS 8.2 nodes, no variables were defined in /xcatpost/mypostscript including $MASTER_IP, and simply removing the xcatsslversion definition from site table fixed the problem.