xcat-core
xcat-core copied to clipboard
Error during centos 8 installation
Hello guys.
Please help me to fix to figure out why the xcat deployment fails in this blade server, I am provisioning a centos 8.2 image with xcat, in another server it install successful but in this one I got this message, in spite of both show the errors related to floppy disk y sha256_mb
ANy ideas I would really appreciate them
Please help me with this topic, Do you have any ideas?
Please provide the following information and perhaps another CentOS user with a similar hardware configuration will be able to assist you:
- Manufacturer and model of blade server.
- Collect the following output and add to the issue:
lsxcatd -v
xcatprobe xcatmn
lsdef -t osimage <OSIMAGE_YOU_ARE_INSTALLING>
lsdef <NODE_THAT_IS_SUCCESSFUL>
lsdef <NODE_THAT_IS_FAILING>
-
Exact procedure you are using to install the node with captured output.
-
Also, while the node is installing, capture the following in a second terminal:
xcatprobe osdeploy -n <NODE_THAT_YOU_ARE_INSTALLING>
@soportemodemat If you are not working directly with @marseaplage, consider opening a second issue as there could be vast differences between your environments with different root causes.
Hello @besawn thank you for your reply. Yes we are working together in this case.
This server is HP ProLiant BL460c G7 The output is this one for all the command that you have indicated:
lsxcatd -v
xcatprobe xcatmn
lsdef -t osimage <OSIMAGE_YOU_ARE_INSTALLING>
lsdef <NODE_THAT_IS_SUCCESSFUL>
lsdef <NODE_THAT_IS_FAILING>
Exact procedure you are using to install the node with captured output.
makehosts makenetworks makedhcp -n makedns -n
rsetboot quinde-1-2 net rpower quinde-1-2 reset
xcatprobe osdeploy -n <NODE_THAT_YOU_ARE_INSTALLING>
from the first screen shot, I think I saw firmware bug
, maybe you can compare the firmware level for two nodes?
rflash <nodename> -c
@cxhong when I execute that command that you told me I got this error for both nodes:
Error: Invalid or unsupported command
This command can also be used to check firmware levels: rinv <nodename> firm
This command can also be used to check firmware levels:
rinv <nodename> firm
Hello @gurevichmark with that command I get this result, how can it help me to solve this problem?
Turn off debug trace mode, so it is easier to read the output with chdef -t site clustersite xcatdebugmode=0
Then run the rinv <nodename> firm
command against working and non-working node to see if there are differences in firmware levels.
Turn off debug trace mode, so it is easier to read the output with
chdef -t site clustersite xcatdebugmode=0
Then run therinv <nodename> firm
command against working and non-working node to see if there are differences in firmware levels.
Hi @gurevichmark, thank you for your reply, there is a difference of firmware according to your explanation, then what I have to do? Are you sure that is the solution a firmware upgrade?
Difference in firmware level could explain why one node is booting and the other one does not.
You can try to upgrade the firmware on quinde-1-2
to the same version as on quinde-2-8
and see if that makes a difference.
Difference in firmware level could explain why one node is booting and the other one does not. You can try to upgrade the firmware on
quinde-1-2
to the same version as onquinde-2-8
and see if that makes a difference.
Hi @gurevichmark I was able to upgrade until the firmware version 1.94 (ilo3) and this is the latest version for that ilo and I still have the same problem. Any other ideas? Just to mention, When I installed directly the Centos 8 on that server it installs without any problem but with that xcat image it doesn't work for that server.
@soportemodemat Are the two nodes not the same models ? The quinde-2-8
seems to have different firmware level.
Have you tried installing diskful or diskless vanilla Centos8 on quinde-1-2
with xCAT ? Maybe the problem is the "custom" part of the custom_centos8-x86_64-install-compute
os image?
@soportemodemat Are the two nodes not the same models ? The
quinde-2-8
seems to have different firmware level.Have you tried installing diskful or diskless vanilla Centos8 on
quinde-1-2
with xCAT ? Maybe the problem is the "custom" part of thecustom_centos8-x86_64-install-compute
os image?
Indeed, they are different servers. No, I haven't tried that because I need that centos version to use with openhpc in diskful type. Therefore I am interested in installing xcat as I did it on the other server without any error.
@soportemodemat I would recommend installing diskful vanilla Centos8 on quinde-1-2
with xCAT. That could tell you if there is something wrong with the server or with your custom Centos8 image definition.
You can also run reventlog quinde-1-2
to see if any hardware or firmware problems logged by the BMC.
I have still the same problem, it just happens with xcat on that server but when I install via cdroom there is no anyproblem during installation:
ANy ideas to solve this?
@soportemodemat I would recommend installing diskful vanilla Centos8 on
quinde-1-2
with xCAT. That could tell you if there is something wrong with the server or with your custom Centos8 image definition.You can also run
reventlog quinde-1-2
to see if any hardware or firmware problems logged by the BMC.
Hi guys, could you help me with the instructions to build an xcat image by using these files:
http://mirror.centos.org/centos/8/BaseOS/x86_64/kickstart/
I have already downloaded this folder to the xcat master. I want to do this because I have the same error as it is reported here: https://community.theforeman.org/t/cant-kickstart-centos-8/15566/17 and those guys says that with those files can be solved:
@soportemodemat
- I am not seeing in the link you referenced above the same error being shown in your screenshot.
- Since you are trying to load an openhpc image, perhaps you can reach out to that community to see if theyanyone had success installing such image with xCAT on the specific server you are having problems with.
- Do you know what are the specific differences between 2 servers ? Since xCAT can install your image on one of the servers, it would help in debugging if we understood how the second server was different.
- You can also attempt to install a diskless CentOS8.2 image on the failing server. If successful, it might give some clues for debugging.
HI @gurevichmark
I really appreciate your reply. About the link, the error that I have is related to iscsi dracut init fails and that entry is here:
Now, I am trying to install a diskless xcat image of centos 8.2 as you recommend me but now I have this error about ipv6 with that image:
I think that the error that I have in the diskful image is related to the kickstart that is in /tftpboot/xcat/xnba/nodes/quinde-1-10 as this case: https://sourceforge.net/p/xcat/mailman/xcat-user/thread/5665DB1B.2060202%40lbl.gov/
Hi guys
According to the error that is shown in the image below and this bug in Centos 7 which is very similar to that one that I have in centos 8 when I deploy centos 8.2 stateful image with xcat 2.16. The solution is to blacklist the multipath kernel module with this: rd.driver.blacklist=dm-multipath. But I do not know where to specify that for the xcat image,could you give me any ideas for that?
T
You can try setting addkcmdline
attribute for node or osimage.
Something like chdef <node> addkcmdline=rd.driver.blacklist=dm-multipath
Hello guys, I am still with the same problem. But I discovered that installing these rpms with these commands sequently in a normal centos 8 installation with DVD it can recognise the network cards of the server:
rpm -Uvh linux-firmware-20200619-101.git3890db36.el8_3.noarch.rpm rpm -Uvh kexec-tools-2.0.20-34.el8_3.1.x86_64.rpm rpm -ivh kernel-4.18.0-193.el8.x86_64.rpm rpm -Uvh kernel-core-4.18.0-193.el8.x86_64.rpm rpm -ivh kernel-core-4.18.0-240.10.1.el8_3.x86_64.rpm rpm -ivh kmod-be2net-12.0.0.0-6.el8_3.elrepo.x86_64.rpm
It is because of this case documented here: http://blog.dovid.net/how-to-get-the-broadcom-network-drivers-working-with-on-a-hp-bl460c-gen8-with-centos8/.
I have tried to install these rpms by injecting them in the image, using pre-installation and post-installation script but nothing works. I was thinking if there is a way to insert this centos 8 node already installed to the xcat cluster, without reinstalling it by PXE. Do you know how to achieve that?
Thank you in advance for your help