os-autoinst-distri-opensuse
os-autoinst-distri-opensuse copied to clipboard
ipxe_install: Make the set_bootscript_hdd() delay configurable
Some workers need more than 2 minutes to reach PXE bootloader after power-on. Add a variable PXE_BOOT_TIME
to control the bootscript switch delay for UEFI machines so that it can be configured on each worker.
This is the correct solution for #17769
- Related ticket: N/A
- Needles: N/A
- Verification runs:
- without
PXE_BOOT_TIME
: https://openqa.suse.de/tests/12223015 - with
PXE_BOOT_TIME
: https://openqa.suse.de/tests/12223016 - without
AUTOYAST
: https://openqa.suse.de/tests/12223020
- without
Both verification runs worked as expected, note the difference in ipxe_install
run time. The failure in autoyast/console
is expected until #17721 gets merged.
Please allow me to share my opinion and help a bit :)
I followed the discussion in Slack, and thank you very much that we indicate the root cause is that set_bootscript_hdd is only specified to Supermicro machine, but hurt for Dell, HP, Lenovo machines that we also have . Since we are hardly to collecting and setting the exact booting time for each machine, IMO, it sounds more efficiency and effective to specify for the machine brand type(Supermicro), rather than to each machine.
@czerw I guess set_bootscript_hdd was added for kernel testing need on supermicro machine, do you have any opinion or suggestion that we can help here to make the solution be much easier for each team for each different machine?
Another solution is https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/17769/files by @Dawei-Pang
Please allow me to share my opinion and help a bit :) I followed the discussion in Slack, and thank you very much that we indicate the root cause is that set_bootscript_hdd is only specified to Supermicro machine, but hurt for Dell, HP, Lenovo machines that we also have . Since we are hardly to collecting and setting the exact booting time for each machine, IMO, it sounds more efficiency and effective to specify for the machine brand type(Supermicro), rather than to each machine.
I never said that Supermicro is the only vendor with this problem. I have no idea how many other vendors may be affected.
Measuring boot time is as easy as running one job on the machine with video capture enabled (NOVIDEO=0
) and then reading timestamps from the video.
@Dawei-Pang @cachen I'm sorry I don't understand what is the issue. Fixed sleep is in the code for 3 years already since https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/11273 and nobody complained. This PR just improves existing code to have delay configurable. If the variable is not defined, it will just use same behavior like before. I will also reply in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/17769/f
Please allow me to share my opinion and help a bit :) I followed the discussion in Slack, and thank you very much that we indicate the root cause is that set_bootscript_hdd is only specified to Supermicro machine, but hurt for Dell, HP, Lenovo machines that we also have . Since we are hardly to collecting and setting the exact booting time for each machine, IMO, it sounds more efficiency and effective to specify for the machine brand type(Supermicro), rather than to each machine.
I never said that Supermicro is the only vendor with this problem. I have no idea how many other vendors may be affected.
Measuring boot time is as easy as running one job on the machine with video capture enabled (
NOVIDEO=0
) and then reading timestamps from the video.
From the discussion in the slack, I suppose you verified the PR#17583 on Supermicro machine, if the PR#17583 verified on other brand/model machine, please let me know. Thanks!
If the PR#17583 only verified or workaround for supermicro machine, please add a checker to limit for supermicro only, because the code without set_bootscript_hdd if get_var('IPXE_UEFI');
can run successfully without any issue for a long time.
@Dawei-Pang @cachen I'm sorry I don't understand what is the issue. Fixed sleep is in the code for 3 years already since #11273 and nobody complained. This PR just improves existing code to have delay configurable. If the variable is not defined, it will just use same behavior like before. I will also reply in #17769
Thanks, at beginning, I think this is timing issue, after some research, yes, the sleep code is running for a long time.
The issue comes from PR#17583, adding set_bootscript_hdd if get_var('IPXE_UEFI');
is root cause.
Please allow me to share my opinion and help a bit :) I followed the discussion in Slack, and thank you very much that we indicate the root cause is that set_bootscript_hdd is only specified to Supermicro machine, but hurt for Dell, HP, Lenovo machines that we also have . Since we are hardly to collecting and setting the exact booting time for each machine, IMO, it sounds more efficiency and effective to specify for the machine brand type(Supermicro), rather than to each machine.
I never said that Supermicro is the only vendor with this problem. I have no idea how many other vendors may be affected.
@mdoucha Hello Martin, no worry I didn't say you said that, but how I concluded from yours and Dawei's observation when read the discussion in slack, and I believe now you are aware from Dawei, besides Supermicro, that are other vendor's are affected.
"One more time: SuperMicro UEFI machines ignore the "force PXE boot" flag." "They have to be configured to ALWAYS use the following BIOS boot order:" "I can confirm the SLE installer change the UEFI boot order to installed OS as default behaviour on HPE, DELL and LENOVO/IBM system except for other action"
Measuring boot time is as easy as running one job on the machine with video capture enabled (
NOVIDEO=0
) and then reading timestamps from the video.
Think about if the code can be much easier to adapting for dozens of machines with different vendor. And if the solution in pr#17769 bring difficult or challenge for kernel team to adapting your test on your machines.
@czerw Hope Dawei has explained to you in his reply, that the main issue and root cause is not the timing, but was introduced by PR#17583, adding set_bootscript_hdd if get_var('IPXE_UEFI'); so whether revert PR#17583, or how to improving the PR is the topic.
I will stop here to do any micro-management to influence the technical decision :-) The last suggestion that Vit and I from TL's view is, perhaps it will be helpful to have a debugging session that Dawei and Martin together, to learn from each other what is the challenge and what is the exactly expected, and of course other relevant engineers can be invited if need.
I've changed the set_bootscript_hdd()
logic to use a special flag that should be added only to the workers/jobs that need it. The original IPXE_UEFI
condition was wrong.
Verification runs updated.
@Dawei-Pang @frankenmichl could we have a look on this one please and maybe have it merged?
Please allow me to involve @Julie-CAO to see if the changes break her testing.
Please allow me to involve @Julie-CAO to see if the changes break her testing.
I will verify on VIRT-AUTO tests, wait a few fours.
This PR is quite old so I've just rebased it to the latest master
.
Please allow me to involve @Julie-CAO to see if the changes break her testing.
I will verify on VIRT-AUTO tests, wait a few fours.
I had two successful runs, I think the PR is ok. @alice-suse , are you also ok with this change? sle15sp6_kvm_on_non_uefi_machine on_UEFI_kermit_on_OSD
LGTM, and verified for usb install, works too.
LGTM, thank you all!
Thanks all, I think we can merge. Just one thing came to my mind, shouldn't we mention variable also in variables.md
?
Thanks all, I think we can merge. Just one thing came to my mind, shouldn't we mention variable also in
variables.md
?
Added variable documentation to variables.md
and rebased to latest master. No other changes.