bootupd icon indicating copy to clipboard operation
bootupd copied to clipboard

efi: update the ESP by creating a tmpdir and RENAME_EXCHANGE

Open HuijingHei opened this issue 1 year ago • 30 comments

See Timothée's comment https://github.com/coreos/bootupd/issues/454#issuecomment-2178227050 Reuse TMP_PREFIX, logic is like this:

  • cp -a fedora .btmp.fedora
    • We start with a copy to make sure to keep all other files that we do not explicitly track in bootupd
  • Update the content of .btmp.fedora with the new binaries
  • Exchange .btmp.fedora -> fedora
  • Remove now "old" .btmp.fedora

If we have a file not in a directory in EFI, then we can copy it to .btmp.foo and then act on it and finally rename it. No need to copy the entire EFI.

Additional info for the logic:

  • for removals, copy to temp dir, remove file in temp dir; if have a file not in a directory, remove it directly
  • for changes/additions, copy to temp dir, write files in temp dir; if have a file not in a directory, copy as temp file
  • Do local exchange/rename and remove temp dir/file

And use insert() instead of push() to match starts_with() when scanning temp files & dirs.

Fixes https://github.com/coreos/bootupd/issues/454


filetree: add failpoint when doing exchange Inspired by https://github.com/coreos/bootupd/pull/669#issuecomment-2220760948

HuijingHei avatar Jun 19 '24 12:06 HuijingHei

[2024-06-20T08:07:06.180Z] # Previous BIOS: grub2-tools-1:2.06-89.fc38.x86_64
[2024-06-20T08:07:06.180Z] # Updated BIOS: grub2-tools-1:2.06-102.fc38.x86_64
[2024-06-20T08:07:06.180Z] # error: Failed to update EFI: No such file or directory (os error 2)
[2024-06-20T08:07:06.180Z] + fatal 'test failed'
[2024-06-20T08:07:06.180Z] + echo error: test failed
[2024-06-20T08:07:06.180Z] error: test failed

🤔

travier avatar Jun 20 '24 08:06 travier

[2024-06-20T12:39:28.464Z] # Previous BIOS: grub2-tools-1:2.06-89.fc38.x86_64

[2024-06-20T12:39:28.464Z] # Updated BIOS: grub2-tools-1:2.06-102.fc38.x86_64

[2024-06-20T12:39:28.464Z] # Previous EFI: grub2-efi-x64-1:2.06-89.fc38.x86_64,shim-x64-15.8-3.x86_64

[2024-06-20T12:39:28.464Z] # Updated EFI: grub2-efi-x64-1:2.06-102.fc38.x86_64,shim-x64-15.8-3.x86_64,test-bootupd-payload-1.0-1.x86_64

[2024-06-20T12:39:28.464Z] # grep: /tmp/tmp.hin3bC02Ci/EFI/fedora/test-bootupd.efi: No such file or directory

[2024-06-20T12:39:28.464Z] # ls: cannot access '/tmp/tmp.hin3bC02Ci/EFI/fedora/test-bootupd.efi': No such file or directory

HuijingHei avatar Jun 20 '24 12:06 HuijingHei

We've merged the CI fixes. Can you rebase this one? Thanks

travier avatar Jun 21 '24 08:06 travier

We've merged the CI fixes. Can you rebase this one? Thanks

Sure, updated, thanks!

HuijingHei avatar Jun 21 '24 08:06 HuijingHei

Updating shim-x64 15.8-2 -> 15.8-3, it will also update BOOT/fbx64.efi

[root@cosa-devsh ~]# bootupctl update -vvv
Previous EFI: grub2-efi-x64-1:2.06-103.fc40.x86_64,shim-x64-15.8-2.x86_64
Updated EFI: grub2-efi-x64-1:2.06-123.fc40.x86_64,shim-x64-15.8-3.x86_64

[root@cosa-devsh ~]# mount /dev/vda2 /boot/efi/
[root@cosa-devsh ~]# ls /boot/efi/EFI/BOOT/ -l
total 1014
-rwxr-xr-x. 1 root root 949424 Jan  1  1980 BOOTX64.EFI
-rwxr-xr-x. 1 root root  87816 Jun 21 10:01 fbx64.efi

HuijingHei avatar Jun 21 '24 10:06 HuijingHei

If using .fedora.tmp will cause CI failed grep: /tmp/tmp.hin3bC02Ci/EFI/fedora/test-bootupd.efi: No such file or directory, guess it is because filetree is based on "BOOT/BOOTX64.EFI" or "fedora/BOOTX64.CSV", so the apply_diff will also create the dir like BOOT and fedora under temp dir, which makes it actually /tmp/tmp.hin3bC02Ci/EFI/fedora/fedora/test-bootupd.efi.

HuijingHei avatar Jun 21 '24 10:06 HuijingHei

The problem with copying the entire EFI directory is that we don't know what's there, and there might a lot of other vendor directories (windows, ubuntu, etc.) that we don't really want to copy as that might take more space than what we have in the EFI partition.

travier avatar Jun 21 '24 11:06 travier

The problem with copying the entire EFI directory is that we don't know what's there, and there might a lot of other vendor directories (windows, ubuntu, etc.) that we don't really want to copy as that might take more space than what we have in the EFI partition.

Makes sense, we might need to do some change to apply_diff, if we make temp for BOOT and fedora.

HuijingHei avatar Jun 21 '24 11:06 HuijingHei

Update the change in apply_diff(), not sure if it is the right place, steps are:

  • get unique updates dir in diff, save them in hashset
  • scan the hashset,
    • extract the diff based on the same parent
    • copy original dir to temp dir
    • remove original file in temp dir first, then write new and changed files to temp dir
    • do local exchange for fedora.tmp to fedora
    • remove fedora.tmp

HuijingHei avatar Jun 25 '24 12:06 HuijingHei

That looks ok but the logic feels a little bit difficult to follow. I might have missed something however so maybe my understanding is not correct.

I'm thinking about the following logic for apply_diff:

  • First look at removals (to minimize disk space usage)
    • Look at which "root/parent" folder the file to remove is
    • Does a temporary copy of this folder exists?
      • Yes, remove the file in the copy
      • No:
        • Copy the folder
        • Remember that we made a copy of this folder
        • Remove the file in the copy
  • Repeat the same logic for changes and additions
  • Sync changes to the disk once here
    • At this point, we haven't yet touched the content actually used to boot
    • If there is not enough space to write all the files, we should have failed before this point
  • Iterate over the folders we made a copy of and do RENAME_EXCHANGE
  • Sync changes between each rename (or once at the end?)
  • Remove all the temporary folders
  • Sync one last time (optional?)

travier avatar Jun 25 '24 14:06 travier

I'm thinking about the following logic for apply_diff:

  • First look at removals (to minimize disk space usage)

    • Look at which "root/parent" folder the file to remove is

    • Does a temporary copy of this folder exists?

      • Yes, remove the file in the copy

      • No:

        • Copy the folder
        • Remember that we made a copy of this folder
        • Remove the file in the copy
  • Repeat the same logic for changes and additions

  • Sync changes to the disk once here

    • At this point, we haven't yet touched the content actually used to boot
    • If there is not enough space to write all the files, we should have failed before this point
  • Iterate over the folders we made a copy of and do RENAME_EXCHANGE

Copy the changed files to prepared temp dir, the final result will not be as expected if the change dir is included by another change. For example, if changes are EFI/fedora/subdir/newgrub.x64 and EFI/fedora/grub.x64, the temp dir will be EFI/fedora/subdir.tmp and EFI/fedora.tmp, then do exchange EFI/fedora/subdir.tmp -> EFI/fedora/subdir, the problem is EFI/fedora.tmp -> EFI/fedora will not include the new change in EFI/fedora/subdir, this is the unit test in test_filetree2() that I debug most of the time, so I do the exchange separately based on dir, any other workarounds for this? It is not problem for the current bootupd(no this patch) as we only update the file, not related to the dir.

  • Sync changes between each rename (or once at the end?)
  • Remove all the temporary folders
  • Sync one last time (optional?)

HuijingHei avatar Jun 26 '24 02:06 HuijingHei

if changes are EFI/fedora/subdir/newgrub.x64 and EFI/fedora/grub.x64, the temp dir will be EFI/fedora/subdir.tmp and EFI/fedora.tmp

With EFI/fedora/subdir/newgrub.x64 changed, we should copy EFI/fedora to EFI/fedora.tmp. That should cover all the subdirs and should be granular enough.

I.e. we need to find the first subdir in EFI and copy that.

travier avatar Jun 26 '24 08:06 travier

if changes are EFI/fedora/subdir/newgrub.x64 and EFI/fedora/grub.x64, the temp dir will be EFI/fedora/subdir.tmp and EFI/fedora.tmp

With EFI/fedora/subdir/newgrub.x64 changed, we should copy EFI/fedora to EFI/fedora.tmp. That should cover all the subdirs and should be granular enough.

I.e. we need to find the first subdir in EFI and copy that.

This is complicated as we do not know which comes first, we just scan them in diff.additions, diff.changes, diff.removals.

If the parent already in temp dir, then no need to create sub temp dir; if no, will copy to temp dir; if temp sub dir exists, then need to rename the changed temp sub dir first, to keep temp dir to updated.

This will make the update slow as we do a lot of checking, WDYT?

HuijingHei avatar Jun 27 '24 07:06 HuijingHei

I'm not sure I understand what you mean by "which comes first". I think the idea is more to think of it as the first subdir in the EFI folder instead of thinking about it as "parent" directory.

If I have the following update/addition/removal for example:

  • EFI/fedora/subdir/foo
  • EFI/fedora/subdir/bar/toto
  • EFI/fedora/baz

Then the logic would be:

  • For the file EFI/fedora/subdir/foo as part of the diff:
    • Look if fedora.tmp exists (the first subdir of EFI)
    • Copy fedora to fedora.tmp and remember it
    • Update/remove/add EFI/fedora.tmp/subdir/foo
  • For the file EFI/fedora/subdir/bar/toto as part of the diff:
    • Look if fedora.tmp exists
    • I does, so nothing to copy
    • Update/remove/addEFI/fedora/subdir/bar/toto
  • Repeat for EFI/fedora/baz
  • Once done for all files in the diff, do the rename for the folders that we remembered we copied:
    • Rename exchange fedora.tmp -> fedora, etc.

travier avatar Jun 27 '24 10:06 travier

I'm not sure I understand what you mean by "which comes first"

Sorry for the confusion, if the changes are EFI/fedora/subdir/newgrub.x64, EFI/fedora/grub.x64, copy to EFI/fedora/subdir.tmp first, then comes EFI/fedora.tmp, need to remove EFI/fedora/subdir.tmp first, just copy to EFI/fedora.tmp.

if the changes are EFI/fedora/grub.x64, EFI/fedora/subdir/newgrub.x64, copy to EFI/fedora.tmp first, then comes EFI/fedora/subdir.tmp, check parent EFI/fedora.tmp exists, just using EFI/fedora.tmp.

For the file EFI/fedora/subdir/foo as part of the diff: Look if fedora.tmp exists (the first subdir of EFI)

If the change is like EFI/foo, the temp dir should be EFI.tmp.

HuijingHei avatar Jun 27 '24 11:06 HuijingHei

If I have the following update/addition/removal for example:

EFI/fedora/subdir/foo EFI/fedora/subdir/bar/toto EFI/fedora/baz

Maybe should scan update/addition/removal in diff to get parent dir, like {EFI/fedora/subdir, EFI/fedora/subdir/bar, EFI/fedora}, then check them, and get only one EFI/fedora, then do copy to temp EFI/fedora.tmp?

HuijingHei avatar Jun 27 '24 12:06 HuijingHei

Thanks for working on this! It's a really important feature. I hope to get some time by the end of this week or early next week to look at this.

cgwalters avatar Jun 27 '24 14:06 cgwalters

Sorry for the confusion, if the changes are EFI/fedora/subdir/newgrub.x64, EFI/fedora/grub.x64, copy to EFI/fedora/subdir.tmp first, then comes EFI/fedora.tmp, need to remove EFI/fedora/subdir.tmp first, just copy to EFI/fedora.tmp.

If we have EFI/fedora/subdir/newgrub.x64, EFI/fedora/grub.x64, we only need to copy at the fedora level, not at any other sub-levels.

For the file EFI/fedora/subdir/foo as part of the diff: Look if fedora.tmp exists (the first subdir of EFI)

If the change is like EFI/foo, the temp dir should be EFI.tmp.

If we have a file not in a directory in EFI, then we can copy it to foo.tmp and then act on it and finally rename it. No need to copy the entire EFI.

travier avatar Jul 01 '24 10:07 travier

The goal is to make the operation atomic at the vendor (i.e. fedora, BOOT, etc.) directory level, not at the EFI or any other sub-directory level.

travier avatar Jul 01 '24 10:07 travier

(I've not looked at the code again yet)

travier avatar Jul 01 '24 11:07 travier

make a test to verify the case where the update fails in the middle, right between the two rename exchange calls.

We need to make sure that we can properly resume an update if it fails in the middle, when half has already been exchanged and the other half still needs to be exchanged (BOOT.tmp -> BOOT exchange already done, fedora.tmp -> fedora exchange still pending). It would be incredibly unlikely to happen in practice but this is the point of this PR thus it would be great to have that.

travier avatar Jul 08 '24 12:07 travier

That might need a special "fail" mode flag for testing however so might not be desirable.

travier avatar Jul 08 '24 12:07 travier

make a test to verify the case where the update fails in the middle, right between the two rename exchange calls.

We need to make sure that we can properly resume an update if it fails in the middle, when half has already been exchanged and the other half still needs to be exchanged (BOOT.tmp -> BOOT exchange already done, fedora.tmp -> fedora exchange still pending)

This is likely to happen, for example, the power can be outage at anytime during update. Not sure about the resume, if running bootupctl update again, apply_diff() will firstly call cleanup_tmp() to remove all temp dirs/files, and then do the sync. I can manually do the testing for this firstly. Not sure about the auto test.

Edit: Not easy to do manually testing, as the update only uses about 1.5s.

# time bootupctl update
Running as unit: bootupd.service
Previous BIOS: grub2-tools-1:2.06-103.fc40.x86_64
Updated BIOS: grub2-tools-1:2.06-123.fc40.x86_64
Previous EFI: grub2-efi-x64-1:2.06-103.fc40.x86_64,shim-x64-15.8-2.x86_64
Updated EFI: grub2-efi-x64-1:2.06-123.fc40.x86_64,shim-x64-15.8-3.x86_64

real    0m1.421s
user    0m0.005s
sys     0m0.010s

HuijingHei avatar Jul 09 '24 13:07 HuijingHei

Edit: Not easy to do manually testing, as the update only uses about 1.5s.

We could do https://github.com/coreos/rpm-ostree/pull/4259/commits/aa8d7fb0ceaabfaf10252180e2ddee049d07aae3 in this repo too

cgwalters avatar Jul 10 '24 15:07 cgwalters

I think my high level feelings on this are basically:

  • I liked the original idea generally
  • It feels 40% less useful if we can't do transactional updates for shim (to rephrase: it's still useful, just less so)
  • This is complex and subtle code and problem domain and while bootupd has tests (and thanks for adding new unit tests I just remain in an uncertain feeling due to the above combined

cgwalters avatar Jul 10 '24 15:07 cgwalters

do transactional updates for shim

Is it something like https://github.com/coreos/bootupd/issues/440?

HuijingHei avatar Jul 12 '24 11:07 HuijingHei

~The error in CI is wired, can reproduce locally using quay.io/centos/centos:stream9~

enabling baseos-source repository
enabling appstream-source repository
enabling extras-common-source repository
CentOS Stream 9 - BaseOS - Source               496  B/s | 3.0 kB     00:06    
Errors during downloading metadata for repository 'baseos-source':
  - Downloading successful, but checksum doesn't match. Calculated: e13bd479141ed656a70be1867b0c7258625f469b13baaf70011319723361358415da1f8417f92cf37eaf5dd4bb83dddae4ae7e52fddbc4c6dfb0ac3af2c3233a(sha512)  Expected: c435af47f3197ce3267e5244050b6b1ef1ee668c91cee288d83c887471d5c239b1c82cfec43c283624ba25d40b6eb01e56768a54bd7ffa4b47a5e21fb62365b3(sha512) 
Error: Failed to download metadata for repo 'baseos-source': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
Error: building at STEP "RUN dnf -y install dnf-utils zstd && dnf builddep -y rust-bootupd": while running runtime: exit status 1

HuijingHei avatar Jul 12 '24 14:07 HuijingHei

  • It feels 40% less useful if we can't do transactional updates for shim (to rephrase: it's still useful, just less so)

Well, it's transactional, but not "across" folders. I find it unlikely that shim and grub would end up needing to be updated at the same time for them to properly work. I've asked input from the bootloader team but did not get any so far.

So far, it's the best that we've got without touching the EFI variables.

travier avatar Jul 12 '24 15:07 travier

  • It feels 40% less useful if we can't do transactional updates for shim (to rephrase: it's still useful, just less so)

Well, it's transactional, but not "across" folders. I find it unlikely that shim and grub would end up needing to be updated at the same time for them to properly work.

So far, it's the best that we've got without touching the EFI variables.

With this patch, it is just to update files in another way (via temp dir). Before enable auto-updates, I think it should be more safer if include https://github.com/coreos/bootupd/issues/440

HuijingHei avatar Jul 16 '24 09:07 HuijingHei

Anything else that I should do before merged? CC @cgwalters @travier , thanks!

HuijingHei avatar Jul 17 '24 03:07 HuijingHei