harvester icon indicating copy to clipboard operation
harvester copied to clipboard

[BUG] Grub boot stanza doesn't exclude floppy. Could result in grub hangs.

Open alexdepalex opened this issue 2 years ago • 33 comments

Describe the bug Grub boot stanza doesn't exclude floppy (--no-floppy). In situations where disks are reshuffled or multiple controller setups, this might result in non-existing floppy drives to be scanned, which blocks the boot cycle, or dropping to grub rescue prompt for 30 minutes.

To Reproduce Steps to reproduce the behavior:

  1. Have an awkward disk configuration or make the partition containing the boot label vanish.

Expected behavior Don't scan for floppy's.

Support bundle

Environment:

  • Harvester ISO version: 1.0.1-rc2
  • Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): Dell RX740d with iDrac 5.10.10.00

Additional context We're trying to install Harvester on sdb, which seems to be causing issues as well.

alexdepalex avatar Apr 05 '22 22:04 alexdepalex

Seems like cOS isn't using the --no-floppy flag when install grub2.

https://github.com/rancher-sandbox/cOS-toolkit/blob/master/packages/backports/installer/cos.sh#L648

alexdepalex avatar Apr 06 '22 07:04 alexdepalex

AFAIK the --no-floppy stanza is not used during grub install but during search instead, so we should change the search stuff in the grub config: https://www.gnu.org/software/grub/manual/grub/grub.html#search

Itxaka avatar Apr 06 '22 08:04 Itxaka

This may be possible to fix on harverster directly by setting the stanza on your grub config: https://github.com/harvester/harvester-installer/blob/master/package/harvester-os/iso/boot/grub2/grub.cfg#L1

Itxaka avatar Apr 06 '22 08:04 Itxaka

Im not sure if we expect anyone to store the kernel on a floppy, so maybe we should just disable it on cos directly if there is not much to gain...

Itxaka avatar Apr 06 '22 08:04 Itxaka

I think this is the template that's used for creating the grub.cfg, so it probably needs to be added there.

https://github.com/rancher-sandbox/cOS-toolkit/blob/master/packages/grub2/config/grub.cfg.tmpl

alexdepalex avatar Apr 06 '22 08:04 alexdepalex

yep, was looking for that...that may be handy if someone wants to add some env files from a floppy...it would be weird but they might use floppy in 2022 I guess lol

Itxaka avatar Apr 06 '22 08:04 Itxaka

I'd say, just disable it. I don't think running Harvester from floppies is an actual use case that they're willing to support.

alexdepalex avatar Apr 06 '22 08:04 alexdepalex

Yes I do agree disabling it makes sense. Note that the config in https://github.com/rancher-sandbox/cOS-toolkit/blob/master/packages/grub2/config/grub.cfg.tmpl applies to a running environment and not to the ISO in case of harvester. This is because https://github.com/harvester/harvester-installer/blob/master/package/harvester-os/iso/boot/grub2/grub.cfg#L1 overrides the cOS defaults

mudler avatar Apr 06 '22 08:04 mudler

PR: https://github.com/rancher-sandbox/cOS-toolkit/pull/1220 Also as mentioned by Ettore, this also requires changes in the harvester-installer grub.cfg that overrides the base cos one.

Poor floppies, wont somebody think of them anymore :crying_cat_face:

Itxaka avatar Apr 06 '22 08:04 Itxaka

Fix was merged on cos-toolkit, new packages should be published in the repos in a couple of hours max! https://github.com/rancher-sandbox/cOS-toolkit/actions/runs/2108139563

Itxaka avatar Apr 07 '22 10:04 Itxaka

Cool. If it's included in the Harvester nightly, I can give it a shot tomorrow.

alexdepalex avatar Apr 07 '22 10:04 alexdepalex

Pre Ready-For-Testing Checklist

  • [ ] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:

  • [x] Is there a workaround for the issue? If so, where is it documented? The workaround is at: ~https://github.com/harvester/harvester-installer/pull/266~ https://github.com/harvester/harvester-installer/pull/381

  • [ ] Does the PR include the explanation for the fix or the feature? ~https://github.com/harvester/harvester-installer/pull/266~

~* [ ] ~~Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?~~ The PR for the YAML change is at: The PR for the chart change is at:~

  • [x] Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)? The PR is at ~https://github.com/harvester/harvester-installer/pull/266~ https://github.com/harvester/harvester-installer/pull/381

  • [x] Which areas/issues this PR might have potential impacts on? Area area/installer Issues

  • [ ] ~~If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at~~

  • [ ] ~~If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at~~

  • [ ] ~~If labeled: require/doc Has the necessary document PR submitted or merged? The documentation issue/PR is at~~

  • [ ] ~~If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue? The automation skeleton PR is at The automation test case PR is at The issue of automation test case implementation is at (bot will auto create one using the template)~~

  • [ ] ~~If labeled: require/integration-test Has the PR includes the integration test? The integration test PR is at~~

  • [ ] ~~If labeled: require/manual-test-plan Has the manual test plan been documented? The updated manual test plan is at~~

  • [ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility? The compatibility issue is filed at

Automation e2e test issue: harvester/tests#268

@alexdepalex can u please help to do a quick validation against the latest v1.0.1-rc3 release, thanks in advance.

guangbochen avatar Apr 08 '22 05:04 guangbochen

Yes. Will do within an hour or 2.

alexdepalex avatar Apr 08 '22 05:04 alexdepalex

At first glance it doesn't seem to fix our issue. I need to wait half an hour before grub proceeds.

alexdepalex avatar Apr 08 '22 07:04 alexdepalex

@alexdepalex any logs? Do we have a link to the iso with the patch to test locally?

Itxaka avatar Apr 08 '22 07:04 Itxaka

oh, I see the link just above :D

Itxaka avatar Apr 08 '22 07:04 Itxaka

@alexdepalex when you refer to Have an awkward disk configuration or make the partition containing the boot label vanish. what do you mean exactly, what type of devices do I need in a VM to reproduce this?

Itxaka avatar Apr 08 '22 07:04 Itxaka

@Itxaka Just FYI, for v1.0.01-rc3 we haven't got a time to upgrade cOS-toolkit. Instead, we patched it manually https://github.com/harvester/harvester-installer/pull/266

weihanglo avatar Apr 08 '22 08:04 weihanglo

@weihanglo yup, saw it but the patch is basically the same (including the problem with the --set=root :P) so it should validate the same I hope.

Itxaka avatar Apr 08 '22 08:04 Itxaka

@alexdepalex any logs? Do we have a link to the iso with the patch to test locally?

I'm doing a redeploy right now. Will try to get the logs from the installer. Are these logs also available on the node after reboot?

alexdepalex avatar Apr 08 '22 08:04 alexdepalex

nope, but Im more interested in the grub logs if its being stuck int here, so you should try to capture the serial output, not sure if possible on that dell server, thats why I wanted to try to reproduce it in a VM, serial logs are easy that way :D

Itxaka avatar Apr 08 '22 08:04 Itxaka

The serial output doesn't show anything additional that's not on the console. Shouldn't grub2-install be executed with the --no-floppy flag as well?

alexdepalex avatar Apr 08 '22 09:04 alexdepalex

As far as I know grub2-install does not have a --no-floppy flag?

https://www.gnu.org/software/grub/manual/grub/html_node/Installing-GRUB-using-grub_002dinstall.html

Usage: grub2-install [OPTION...] [OPTION] [INSTALL_DEVICE]
Install GRUB on your drive.

      --compress=no|xz|gz|lzo   compress GRUB files [optional]
      --disable-shim-lock    disable shim_lock verifier
      --dtb=FILE             embed a specific DTB
  -d, --directory=DIR        use images and modules under DIR
                             [default=/usr/share/grub2/<platform>]
      --fonts=FONTS          install FONTS [default=unicode]
      --install-modules=MODULES   install only MODULES and their dependencies
                             [default=all]
  -k, --pubkey=FILE          embed FILE as public key for signature checking
      --locale-directory=DIR use translations under DIR
                             [default=/usr/share/locale]
      --locales=LOCALES      install only LOCALES [default=all]
      --modules=MODULES      pre-load specified modules MODULES
      --sbat=FILE            SBAT metadata
      --themes=THEMES        install THEMES [default=starfield]
  -v, --verbose              print verbose messages.
      --allow-floppy         make the drive also bootable as floppy (default
                             for fdX devices). May break on some BIOSes.
      --boot-directory=DIR   install GRUB images under the directory DIR/grub2
                             instead of the boot/grub2 directory
      --bootloader-id=ID     the ID of bootloader. This option is only
                             available on EFI and Macs.
      --core-compress=xz|none|auto
                             choose the compression to use for core image
      --disk-module=MODULE   disk module to use (biosdisk or native). This
                             option is only available on BIOS target.
      --efi-directory=DIR    use DIR as the EFI System Partition root.
      --force                install even if problems are detected
      --force-file-id        use identifier file even if UUID is available
      --label-bgcolor=COLOR  use COLOR for label background
      --label-color=COLOR    use COLOR for label
      --label-font=FILE      use FILE as font for label
      --macppc-directory=DIR use DIR for PPC MAC install.
      --no-bootsector        do not install bootsector
      --no-nvram             don't update the `boot-device'/`Boot*' NVRAM
                             variables. This option is only available on EFI
                             and IEEE1275 targets.
      --no-rs-codes          Do not apply any reed-solomon codes when
                             embedding core.img. This option is only available
                             on x86 BIOS targets.
      --product-version=STRING   use STRING as product version
      --recheck              delete device map if it already exists
      --removable            the installation device is removable. This option
                             is only available on EFI.
  -s, --skip-fs-probe        do not probe for filesystems in DEVICE
      --suse-enable-tpm      install TPM modules
      --suse-force-signed    force installation of signed grub.This option is
                             only available on ARM64 EFI targets.
      --suse-inhibit-signed  inhibit installation of signed grub. This option
                             is only available on ARM64 EFI targets.
      --target=TARGET        install GRUB for TARGET platform
                             [default=i386-pc]; available targets:
                             arm-coreboot, arm-efi, arm-uboot, arm64-efi,
                             i386-coreboot, i386-efi, i386-ieee1275,
                             i386-multiboot, i386-pc, i386-qemu, i386-xen,
                             i386-xen_pvh, ia64-efi, mips-arc, mips-qemu_mips,
                             mipsel-arc, mipsel-loongson, mipsel-qemu_mips,
                             powerpc-ieee1275, riscv32-efi, riscv64-efi,
                             s390x-emu, sparc64-ieee1275, x86_64-efi,
                             x86_64-xen
      --zipl-directory=DIR   use DIR as the zIPL Boot Partition root.
  -?, --help                 give this help list
      --usage                give a short usage message
  -V, --version              print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

INSTALL_DEVICE must be system device filename.
grub2-install copies GRUB images into boot/grub2.  On some platforms, it may
also install GRUB into the boot sector.

Report bugs to <[email protected]>.

Itxaka avatar Apr 08 '22 09:04 Itxaka

It's a hidden option for whatever reason.

https://github.com/rhboot/grub2/blob/master/util/grub-install.c#L289

alexdepalex avatar Apr 08 '22 09:04 alexdepalex

Ah, seems to be for compatibility.

alexdepalex avatar Apr 08 '22 09:04 alexdepalex

yep, just saw it, it doesnt seem to do anything at all :/

Itxaka avatar Apr 08 '22 09:04 Itxaka

@alexdepalex can you maybe drop to the grub shell (press c in the grub menu) and run the search commands manually to see if its indeed that what is slowing down the boot? Maybe the issue is a different one and we are working on the wrong stuff

Itxaka avatar Apr 08 '22 10:04 Itxaka

~~It is hard to be reproduced on our local or KVM env, wondering if we can have some stable reproduction steps using a KVM?~~

We are considering moving this one to the v1.0.2 release (targetting on 10th May), let us know if u guys have additional concerns, thanks.

guangbochen avatar Apr 08 '22 10:04 guangbochen

@alexdepalex can you maybe drop to the grub shell (press c in the grub menu) and run the search commands manually to see if its indeed that what is slowing down the boot? Maybe the issue is a different one and we are working on the wrong stuff

So this occurs before the grub config is actually loaded. If I drop to a shell after that and try to ls the devices, it hangs again. After some time it proceeds and print 2 floppy drives.

alexdepalex avatar Apr 08 '22 10:04 alexdepalex

From the current investigation, the causes might be:

  • GRUB is stuck when scanning VirtualCD device (even the device is disabled)
  • related to specific fw version of Dell BOSS-S1 adapter. A supportconfig data could help.

bk201 avatar Apr 21 '22 01:04 bk201

Issue is related to a specific firmware of Dell BOSS adapter (3022). Downgrading firmware solves the problem

ravarga avatar May 05 '22 06:05 ravarga

Check if we can revert https://github.com/harvester/harvester-installer/pull/279

Update: need to wait for https://github.com/harvester/harvester/issues/2565 to remove the workaround

bk201 avatar Sep 06 '22 03:09 bk201