harvester
harvester copied to clipboard
[BUG] Grub boot stanza doesn't exclude floppy. Could result in grub hangs.
Describe the bug Grub boot stanza doesn't exclude floppy (--no-floppy). In situations where disks are reshuffled or multiple controller setups, this might result in non-existing floppy drives to be scanned, which blocks the boot cycle, or dropping to grub rescue prompt for 30 minutes.
To Reproduce Steps to reproduce the behavior:
- Have an awkward disk configuration or make the partition containing the boot label vanish.
Expected behavior Don't scan for floppy's.
Support bundle
Environment:
- Harvester ISO version: 1.0.1-rc2
- Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): Dell RX740d with iDrac 5.10.10.00
Additional context We're trying to install Harvester on sdb, which seems to be causing issues as well.
Seems like cOS isn't using the --no-floppy
flag when install grub2.
https://github.com/rancher-sandbox/cOS-toolkit/blob/master/packages/backports/installer/cos.sh#L648
AFAIK the --no-floppy
stanza is not used during grub install but during search instead, so we should change the search stuff in the grub config: https://www.gnu.org/software/grub/manual/grub/grub.html#search
This may be possible to fix on harverster directly by setting the stanza on your grub config: https://github.com/harvester/harvester-installer/blob/master/package/harvester-os/iso/boot/grub2/grub.cfg#L1
Im not sure if we expect anyone to store the kernel on a floppy, so maybe we should just disable it on cos directly if there is not much to gain...
I think this is the template that's used for creating the grub.cfg
, so it probably needs to be added there.
https://github.com/rancher-sandbox/cOS-toolkit/blob/master/packages/grub2/config/grub.cfg.tmpl
yep, was looking for that...that may be handy if someone wants to add some env files from a floppy...it would be weird but they might use floppy in 2022 I guess lol
I'd say, just disable it. I don't think running Harvester from floppies is an actual use case that they're willing to support.
Yes I do agree disabling it makes sense. Note that the config in https://github.com/rancher-sandbox/cOS-toolkit/blob/master/packages/grub2/config/grub.cfg.tmpl applies to a running environment and not to the ISO in case of harvester. This is because https://github.com/harvester/harvester-installer/blob/master/package/harvester-os/iso/boot/grub2/grub.cfg#L1 overrides the cOS defaults
PR: https://github.com/rancher-sandbox/cOS-toolkit/pull/1220 Also as mentioned by Ettore, this also requires changes in the harvester-installer grub.cfg that overrides the base cos one.
Poor floppies, wont somebody think of them anymore :crying_cat_face:
Fix was merged on cos-toolkit, new packages should be published in the repos in a couple of hours max! https://github.com/rancher-sandbox/cOS-toolkit/actions/runs/2108139563
Cool. If it's included in the Harvester nightly, I can give it a shot tomorrow.
Pre Ready-For-Testing Checklist
-
[ ] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:
-
[x] Is there a workaround for the issue? If so, where is it documented? The workaround is at: ~https://github.com/harvester/harvester-installer/pull/266~ https://github.com/harvester/harvester-installer/pull/381
-
[ ] Does the PR include the explanation for the fix or the feature? ~https://github.com/harvester/harvester-installer/pull/266~
~* [ ] ~~Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?~~ The PR for the YAML change is at: The PR for the chart change is at:~
-
[x] Have the backend code been merged (harvester, harvester-installer, etc) (including
backport-needed/*
)? The PR is at ~https://github.com/harvester/harvester-installer/pull/266~ https://github.com/harvester/harvester-installer/pull/381 -
[x] Which areas/issues this PR might have potential impacts on? Area
area/installer
Issues -
[ ] ~~If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at~~
-
[ ] ~~If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at~~
-
[ ] ~~If labeled: require/doc Has the necessary document PR submitted or merged? The documentation issue/PR is at~~
-
[ ] ~~If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue? The automation skeleton PR is at The automation test case PR is at The issue of automation test case implementation is at (bot will auto create one using the template)~~
-
[ ] ~~If labeled: require/integration-test Has the PR includes the integration test? The integration test PR is at~~
-
[ ] ~~If labeled: require/manual-test-plan Has the manual test plan been documented? The updated manual test plan is at~~
-
[ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label
release/obsolete-compatibility
? The compatibility issue is filed at
Automation e2e test issue: harvester/tests#268
@alexdepalex can u please help to do a quick validation against the latest v1.0.1-rc3 release, thanks in advance.
Yes. Will do within an hour or 2.
At first glance it doesn't seem to fix our issue. I need to wait half an hour before grub proceeds.
@alexdepalex any logs? Do we have a link to the iso with the patch to test locally?
oh, I see the link just above :D
@alexdepalex when you refer to Have an awkward disk configuration or make the partition containing the boot label vanish.
what do you mean exactly, what type of devices do I need in a VM to reproduce this?
@Itxaka Just FYI, for v1.0.01-rc3 we haven't got a time to upgrade cOS-toolkit. Instead, we patched it manually https://github.com/harvester/harvester-installer/pull/266
@weihanglo yup, saw it but the patch is basically the same (including the problem with the --set=root :P) so it should validate the same I hope.
@alexdepalex any logs? Do we have a link to the iso with the patch to test locally?
I'm doing a redeploy right now. Will try to get the logs from the installer. Are these logs also available on the node after reboot?
nope, but Im more interested in the grub logs if its being stuck int here, so you should try to capture the serial output, not sure if possible on that dell server, thats why I wanted to try to reproduce it in a VM, serial logs are easy that way :D
The serial output doesn't show anything additional that's not on the console. Shouldn't grub2-install
be executed with the --no-floppy
flag as well?
As far as I know grub2-install
does not have a --no-floppy
flag?
https://www.gnu.org/software/grub/manual/grub/html_node/Installing-GRUB-using-grub_002dinstall.html
Usage: grub2-install [OPTION...] [OPTION] [INSTALL_DEVICE]
Install GRUB on your drive.
--compress=no|xz|gz|lzo compress GRUB files [optional]
--disable-shim-lock disable shim_lock verifier
--dtb=FILE embed a specific DTB
-d, --directory=DIR use images and modules under DIR
[default=/usr/share/grub2/<platform>]
--fonts=FONTS install FONTS [default=unicode]
--install-modules=MODULES install only MODULES and their dependencies
[default=all]
-k, --pubkey=FILE embed FILE as public key for signature checking
--locale-directory=DIR use translations under DIR
[default=/usr/share/locale]
--locales=LOCALES install only LOCALES [default=all]
--modules=MODULES pre-load specified modules MODULES
--sbat=FILE SBAT metadata
--themes=THEMES install THEMES [default=starfield]
-v, --verbose print verbose messages.
--allow-floppy make the drive also bootable as floppy (default
for fdX devices). May break on some BIOSes.
--boot-directory=DIR install GRUB images under the directory DIR/grub2
instead of the boot/grub2 directory
--bootloader-id=ID the ID of bootloader. This option is only
available on EFI and Macs.
--core-compress=xz|none|auto
choose the compression to use for core image
--disk-module=MODULE disk module to use (biosdisk or native). This
option is only available on BIOS target.
--efi-directory=DIR use DIR as the EFI System Partition root.
--force install even if problems are detected
--force-file-id use identifier file even if UUID is available
--label-bgcolor=COLOR use COLOR for label background
--label-color=COLOR use COLOR for label
--label-font=FILE use FILE as font for label
--macppc-directory=DIR use DIR for PPC MAC install.
--no-bootsector do not install bootsector
--no-nvram don't update the `boot-device'/`Boot*' NVRAM
variables. This option is only available on EFI
and IEEE1275 targets.
--no-rs-codes Do not apply any reed-solomon codes when
embedding core.img. This option is only available
on x86 BIOS targets.
--product-version=STRING use STRING as product version
--recheck delete device map if it already exists
--removable the installation device is removable. This option
is only available on EFI.
-s, --skip-fs-probe do not probe for filesystems in DEVICE
--suse-enable-tpm install TPM modules
--suse-force-signed force installation of signed grub.This option is
only available on ARM64 EFI targets.
--suse-inhibit-signed inhibit installation of signed grub. This option
is only available on ARM64 EFI targets.
--target=TARGET install GRUB for TARGET platform
[default=i386-pc]; available targets:
arm-coreboot, arm-efi, arm-uboot, arm64-efi,
i386-coreboot, i386-efi, i386-ieee1275,
i386-multiboot, i386-pc, i386-qemu, i386-xen,
i386-xen_pvh, ia64-efi, mips-arc, mips-qemu_mips,
mipsel-arc, mipsel-loongson, mipsel-qemu_mips,
powerpc-ieee1275, riscv32-efi, riscv64-efi,
s390x-emu, sparc64-ieee1275, x86_64-efi,
x86_64-xen
--zipl-directory=DIR use DIR as the zIPL Boot Partition root.
-?, --help give this help list
--usage give a short usage message
-V, --version print program version
Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
INSTALL_DEVICE must be system device filename.
grub2-install copies GRUB images into boot/grub2. On some platforms, it may
also install GRUB into the boot sector.
Report bugs to <[email protected]>.
It's a hidden option for whatever reason.
https://github.com/rhboot/grub2/blob/master/util/grub-install.c#L289
Ah, seems to be for compatibility.
yep, just saw it, it doesnt seem to do anything at all :/
@alexdepalex can you maybe drop to the grub shell (press c in the grub menu) and run the search commands manually to see if its indeed that what is slowing down the boot? Maybe the issue is a different one and we are working on the wrong stuff
~~It is hard to be reproduced on our local or KVM env, wondering if we can have some stable reproduction steps using a KVM?~~
We are considering moving this one to the v1.0.2 release (targetting on 10th May), let us know if u guys have additional concerns, thanks.
@alexdepalex can you maybe drop to the grub shell (press c in the grub menu) and run the search commands manually to see if its indeed that what is slowing down the boot? Maybe the issue is a different one and we are working on the wrong stuff
So this occurs before the grub config is actually loaded. If I drop to a shell after that and try to ls
the devices, it hangs again. After some time it proceeds and print 2 floppy drives.
From the current investigation, the causes might be:
- GRUB is stuck when scanning VirtualCD device (even the device is disabled)
- related to specific fw version of Dell BOSS-S1 adapter. A supportconfig data could help.
Issue is related to a specific firmware of Dell BOSS adapter (3022). Downgrading firmware solves the problem
Check if we can revert https://github.com/harvester/harvester-installer/pull/279
Update: need to wait for https://github.com/harvester/harvester/issues/2565 to remove the workaround