dasharo-issues icon indicating copy to clipboard operation
dasharo-issues copied to clipboard

OS block booting verification

Open BeataZdunczyk opened this issue 10 months ago • 2 comments

Brief summary

Test a mechanism that prevents the operating system from booting until a reboot occurs after the firmware update process. The primary objective is to ensure that the system remains secure and stable after firmware updates are applied.

Additional context

As a test, CapsuleApp.efi should be called with:

  • non-FMP capsule (e.g. UX),
  • FMP capsule signed with wrong key (like the one used for https://github.com/Dasharo/dasharo-issues/issues/804),
  • invalid FMP capsule (wrong GUID, size or version, if anti-rollback is enabled).

In all of those cases, platform must reboot without booting to OS/Shell.

BeataZdunczyk avatar Apr 17 '24 15:04 BeataZdunczyk

Initial tests can be performed on QEMU, but it must have capsule options enabled in the config. So, basically what has to be done in this task is:

  1. Clone coreboot from https://github.com/Dasharo/coreboot/tree/uefi-capsules

  2. Add missing options to the Q35 config (CONFIG_DRIVERS_EFI_FW_INFO=y and CONFIG_DRIVERS_EFI_UPDATE_CAPSULES=y).

  3. Build the image.

  4. Make a copy of that file because it will be updated by the capsule, and you probably will need to restore previous version.

  5. Bump CONFIG_LOCALVERSION in config to something higher and build again.

  6. Create a capsule from the newer version. You may need to set LowestSupportedVersion as "0x00000000", just in case.

  7. Create a drive image formatted as FAT32 and copy capsule created in 6. together with CapsuleApp.efi (will be located in coreboot/payloads/external/edk2/workspace/Build/DasharoPayloadPkgX64/RELEASE_COREBOOT/X64/MdeModulePkg/Application/CapsuleApp/CapsuleApp/OUTPUT/CapsuleApp.efi after building firmware, it doesn’t matter if you use version from old or new LOCALVERSION as they are identical).

  8. Start QEMU with copy of old firmware, mounted drive with capsule (not virtio, ide should be used), enter UEFI shell.

  9. Confirm the version with smbiosview -t 0 (field BIOS Version:) - should contain old CONFIG_LOCALVERSION.

  10. Run CapsuleApp.efi dasharo.cap. It should automatically cause the reboot, after which you should be presented with bootsplash and progress bar, wait for it to finish. After that the platform will reboot itself.

  11. Enter UEFI shell and confirm the version with smbiosview -t 0 (field BIOS Version:) - should contain new CONFIG_LOCALVERSION.

That was the expected flow for valid capsule. Please start with that and report issues (if any) before trying to test invalid ones.


Invalid keys

From the previous state:

  1. Copy original old firmware back to working copy (i.e. “downgrade”) - easier and faster than rebuilding.

  2. Create a new set of signing keys.

  3. Create new capsule as previously, but specify keys from 2. in JSON file. You may want to have separate JSON and capsule files to be able to quickly re-run the tests.

  4. Copy new capsule to drive mounted in QEMU (again, may be worth to have a different name so that multiple capsules can coexist on one drive image).

  5. Repeat steps 8-10 from valid capsule flow. There should be no update (no progress bar), but the platform should reboot twice anyway. I’m not sure how to reliably test how many reboots there are, at least not on release builds. In case of QEMU it should be clearly noticeable due to this issue, but for MSI we will have to rely on CapsuleApp.efi -S, as described in this comment.

  6. Enter UEFI shell and confirm the version with smbiosview -t 0 (field BIOS Version:) - should contain old CONFIG_LOCALVERSION.


Invalid GUID

Same as invalid keys, except when building capsules modify JSON:

  1. Use valid keys (original BaseTools/Source/Python/Pkcs7Sign/Test*.pem files).

  2. Change GUID to anything else.


Invalid version (anti-rollback)

We won’t be able to test it until we have at least 2 releases with support for capsule updates (preferably at least one real release and one dev release for testing). This will also be easier after capsule creation is more automatic than currently, which will probably happen around https://github.com/Dasharo/dasharo-issues/issues/807.

This test would make use of DRIVERS_EFI_MAIN_FW_VERSION, DRIVERS_EFI_MAIN_FW_LSV in config and LowestSupportedVersion in JSON file. At this point I have no idea how to test it, short of building few pseudo-releases with different version numbers, and signing them with release keys. If such pseudo-release would somehow end up public, it would be hard to track whether this is test release or real one, so maybe this isn’t a good idea. There is also a question of RC binaries, they would have to have version different than the final one, yet numerically lower than a final release... Anyway, this isn’t something you have to worry now, but at some point in the future this will have to be added.

krystian-hebel avatar Aug 16 '24 10:08 krystian-hebel

Clone coreboot from https://github.com/Dasharo/coreboot/tree/uefi-capsules Add missing options to the Q35 config (CONFIG_DRIVERS_EFI_FW_INFO=y and > CONFIG_DRIVERS_EFI_UPDATE_CAPSULES=y). Build the image. Make a copy of that file because it will be updated by the capsule, and you probably will need to restore previous version.

Do we plan to enable this driver in the qemu build in the long run? Maybe we can already to that, so qemu with capsule update can be build in CI on this branch?

Or add another build config to do so?

macpijan avatar Aug 16 '24 10:08 macpijan

@JanPrusinowski could you please indicate how many, and what test cases you plan to prepare in the form of a checklist to track progress? Please remember to gather logs from testing newly made test cases and indicate which version of the capsule update you were testing.

BeataZdunczyk avatar Aug 29 '24 09:08 BeataZdunczyk

@BeataZdunczyk

  • [x] TC1: Update DUT with a capsule and check if the update was sucessfull by checking if BIOS Version has changed.
  • [x] TC2: Try to update DUT with capsule with invalid keys. Verify if the DUT reboots twice. Verify that the BIOS Version wasn't changed.
  • [x] TC3: Try to update DUT with invalid GUID. Same as TC2 but with wrong GUID and valid keys.
  • [ ] TC4: Try to update DUT with invalid fw version. This test might not work as @krystian-hebel stated that we might need at least 3 releases to be able to check if it works.

TC1 is somewhat ready. I'm working on automating creation of capsules needed for testing.

JanPrusinowski avatar Aug 29 '24 11:08 JanPrusinowski

@krystian-hebel @SergiiDmytruk, did you consider UEFI SCT Capsule conformance?

https://github.com/search?q=repo%3Atianocore%2Fedk2-test+CAPSULE&type=code

pietrushnic avatar Aug 29 '24 11:08 pietrushnic

@JanPrusinowski I have updated your comment to be in the form of a checklist. @krystian-hebel will propose here switching the order of test here so that we only need to flash the platform once.

BeataZdunczyk avatar Aug 29 '24 11:08 BeataZdunczyk

@JanPrusinowski I think we should move TC1 to the very end, maybe even leave few unused numbers for future tests (e.g. make it TC50). That way we would save few flashing cycles because final image would be identical to ROM and flashrom updates only parts that are changed. This would both save time and reduce flash wear.

TC4 may be split into few different cases (e.g. valid downgrade, attempt to downgrade beyond minimal version, transition between RC and normal releases - both ways). But still, for now we don't have to worry about it as we won't be able to test it for few next releases anyway.

krystian-hebel avatar Aug 29 '24 12:08 krystian-hebel

I'm working on automating creation of capsules needed for testing.

Why? Have you two agreed on this, @JanPrusinowski and @krystian-hebel? I am wondering if we really need this at this point. We do have Automate the creation and execution of the UX capsule task in a phase 5, so I guess tests that verify the automation should be prepared then.

BeataZdunczyk avatar Aug 29 '24 14:08 BeataZdunczyk

@BeataZdunczyk we agreed on it. UX will be handled very differently than what's needed here. We need some automation here, otherwise tests would require both valid and invalid capsules to be passed externally. This would possibly give false positive results if the tester didn't built capsules properly.

krystian-hebel avatar Aug 29 '24 14:08 krystian-hebel

@krystian-hebel @SergiiDmytruk, did you consider UEFI SCT Capsule conformance?

I know I didn't. It looks like test of UEFI API itself, not sure it makes sense to introduce the use of UEFI SCT specifically for capsules.

SergiiDmytruk avatar Aug 29 '24 21:08 SergiiDmytruk

I wanted to introduce UEFI SCT for a long time, but if you say it doesn't make sense, I'm good with that. I know that ProjectMu uses UEFI SCT to validate their edk2 fork, so I thought it could be beneficial in our case, but maybe I'm wrong here, and it doesn't add any value.

pietrushnic avatar Aug 29 '24 22:08 pietrushnic

I suppose it can be beneficial to add it at some point for validation of the fork, but I don't expect it to catch anything for capsules (because we barely touched implementation of those calls) which is why it seems not worth the effort in this case (unless it's really easy to integrate).

SergiiDmytruk avatar Aug 30 '24 16:08 SergiiDmytruk

I have prepared tests for: TC1: Update DUT with a capsule and check if the update was sucessfull by checking if BIOS Version has changed. TC2: Try to update DUT with capsule with invalid keys. Verify if the DUT reboots twice. Verify that the BIOS Version wasn't changed. TC3: Try to update DUT with invalid GUID. Same as TC2 but with wrong GUID and valid keys. Also run them both on Qemu and MSI More details can be found at: https://github.com/Dasharo/open-source-firmware-validation/pull/457

JanPrusinowski avatar Sep 03 '24 21:09 JanPrusinowski

To run tests on qemu prepare a valid capsule file and then use this capsule file to generate invalid capsules required in tests by running the script:

./scripts/capsules/capsule_update_tests.sh dasharo.cap

then start the tests:

robot -v snipeit:no -L TRACE -v rte_ip:127.0.0.1 -v config:qemu -v capsule_fw_file:dasharo.cap dasharo-stability/capsule-update.robot

To start tests on MSI before preparing the capsule please edit FW to enable Console Serial Redirection. Use the guide: https://github.com/Dasharo/open-source-firmware-validation/blob/develop/docs/troubleshooting.md without it a successful flash of DUT will prevent tests from working correctly. Before starting the test run the capsule prepare script (same as on qemu).

robot -v snipeit:no -L TRACE -v rte_ip:192.168.10.188 -v config:msi-pro-z690-a-ddr5 -v sonoff_ip:192.168.10.69 -v pikvm_ip:192.168.10.45 -v device_ip:192.168.10.39 -v fw_file:./msi_ms7d25_v1.1.2_ddr5.rom -v capsule_fw_file:./msi_ms7d25_v1.1.3_ddr5.cap dasharo-stability/capsule-update.robot

Tests on MSI fail for now as FW doesn't support capsule update yet. @SergiiDmytruk @krystian-hebel I could test if it works if I would get a modified FW for MSI - however on qemu everything works as it should and tests itself start on MSI so everything should work when the modified FW will be available.

image

log-msi.zip log-qemu.zip

JanPrusinowski avatar Sep 04 '24 07:09 JanPrusinowski

Tests that were added: TC1: Update DUT with a capsule and check if the update was sucessfull by checking if BIOS Version has changed. TC2: Try to update DUT with capsule with invalid keys. Verify that the BIOS Version wasn't changed. TC3: Try to update DUT with invalid GUID. Same as TC2 but with wrong GUID and valid keys.

PR availible: https://github.com/Dasharo/open-source-firmware-validation/pull/457

MSI tests were conducted on: MSI PRO Z690-A DDR5 Logs can be found in my previous comment

Both on QEMU and MSI I was able to verify that DUT wont reset if the capsule is build with dropped --capflag InitiateReset flag. However on MSI capsule update is not supported yet in FW so tests cant be completed: image

JanPrusinowski avatar Sep 04 '24 07:09 JanPrusinowski

MSI tests were conducted on:

@JanPrusinowski This is an internal link. Please just add information about the platform

BeataZdunczyk avatar Sep 04 '24 07:09 BeataZdunczyk

I have updated tests and the script generating capsules required by tests. Tests now work on MSI. log-msi-pass.zip

JanPrusinowski avatar Sep 05 '24 17:09 JanPrusinowski

First somewhat successful test. The FS on which Ubuntu is installed showed in Uefi Shell is not consistent and it may need to be somehow dynamically determined or found. Locally I added a loop iterating over all the FS's.

  • CUP001.001 Failed because Capsule Status was Security Violation and not Not Ready.
  • CUP002.001 Passed
  • CUP050.001 Failed because No match found for 'to boot directly' in 3 minutes, the platform booted into Ubuntu before to boot directly was found. Serial redirection was turned off. Maybe I have did something wrong when setting it in the FW using dcu. I will set it in the binary again and run the tests another time.

capsule_tests_1.zip

philipandag avatar Sep 06 '24 13:09 philipandag

@philipandag This is strange. Because first two tests should have not changed the FW at all... And capsule update should not turn off the Serial redirection if it was turned on previousely. Are you running each test seprately or are you running the whole suite? Maybe you have flashed FW in between running tests?

JanPrusinowski avatar Sep 06 '24 14:09 JanPrusinowski

I am running the whole suite. From what I recall in the documentation on capsule updates it said that the setup menu options will be restored to defaults after an update. The serial redirection was probably turned off as a result of performing a capsule update created from a fw image which had them disabled. I forgot to replace it with the changed version. I am running the suite again now.

philipandag avatar Sep 06 '24 14:09 philipandag

The suite failed again on CUP050.001 because the platform took over 3 minutes to boot after the update. CUP001.001 Fails identically like before. I have modified the test a bit so that it would continue even if the platform needs a lot of time to boot after an update/flash but then it started to freeze on the boot logo and was not showing any text on the screen.

After reflashing manually the CUP050 test has passed after I have turned on the serial redirection manually. I don’t know why it is overwritten. I have replaced the ./build/coreboot.rom file which is used for the update with a modified one regenerated the capsule and replaced it in the open-source-firmware-validation repo where I run the tests.

(venv) fgolas•review-capsule-user-guide/open-source-firmware-validation/dcu(main)» ./dcuc variable coreboot.rom --get SerialRedirection              [17:35:20]
Enabled
(venv) fgolas•review-capsule-user-guide/open-source-firmware-validation/dcu(main)»   

( It is a copy of the file. I copied it to dcu directory to check the variable) CUP050.zip

CUP001.001 Still fails in the same exact way as before in my case.

philipandag avatar Sep 06 '24 15:09 philipandag

@philipandag Could you provide logs from the failed tests?

JanPrusinowski avatar Sep 06 '24 17:09 JanPrusinowski

@philipandag, you don't mention these things, so I want to check that they were taken into account:

  1. Re-generation of test capsules via scripts/capsules/capsule_update_tests.sh
  2. Uncommenting of
    ...                     AND
    ...                     Flash Firmware If Not QEMU
    
    in your clone of OSFV repo.

SergiiDmytruk avatar Sep 06 '24 18:09 SergiiDmytruk

The suite failed again on CUP050.001 because the platform took over 3 minutes to boot after the update.

I think it just doesn't boot after flashing. Power On keyword doesn't seem to take Sonoff into account and I saw that it was off all the time after the test has flashed firmware and waited for the platform to boot. Power Cycle On seems to be necessary after flashing. I don't deal with OSFV often enough to know such things for sure but it's used in some other places after flashing and I'm testing that now.

Update: adding Power Cycle On seems to help.

SergiiDmytruk avatar Sep 06 '24 18:09 SergiiDmytruk

  1. I did re-generate the capsules
  2. I did not uncomment anything

I think it just doesn't boot after flashing.

It eventually does. Just after a long time. Adding Wait Until Keyword Succeeds before Enter Boot Menu Tianocore helps with that by effectively multiplying the timeout by an integer. If using Power Cycle On speeds things up then that's great, let's use it instead.

philipandag avatar Sep 09 '24 06:09 philipandag

From what I recall in the documentation on capsule updates it said that the setup menu options will be restored to defaults after an update. The serial redirection was probably turned off as a result of performing a capsule update created from a fw image which had them disabled.

These settings will be preserved for final implementation, but aren't yet. This will be done as part of https://github.com/Dasharo/dasharo-issues/issues/809.

I think it just doesn't boot after flashing. Power On keyword doesn't seem to take Sonoff into account and I saw that it was off all the time after the test has flashed firmware and waited for the platform to boot. Power Cycle On seems to be necessary after flashing. I don't deal with OSFV often enough to know such things for sure but it's used in some other places after flashing and I'm testing that now.

Update: adding Power Cycle On seems to help.

If using Power Cycle On speeds things up then that's great, let's use it instead.

Let's not. There should be no need to cut the power after an update, and we should test it in a way that would be the closest to what we expect the end-users will do.

krystian-hebel avatar Sep 09 '24 10:09 krystian-hebel

Let's not. There should be no need to cut the power after an update, and we should test it in a way that would be the closest to what we expect the end-users will do.

This isn't part of the test, but part of the setup to flash initial ROM which turns Sonoff off but nothing turns it on according to Robot's log.

SergiiDmytruk avatar Sep 09 '24 10:09 SergiiDmytruk

Logs using my binaries at this moment:

log_my_binaries.zip

I will now try to use the ones Sergii sent

philipandag avatar Sep 09 '24 10:09 philipandag

Using the binaries Sergii sent I get a fail in CUP001 & CUP002 and a PASS in CUP999 sergii_binaries.zip

philipandag avatar Sep 09 '24 12:09 philipandag

Using the binaries Sergii sent I get a fail in CUP001 & CUP002 and a PASS in CUP999

That was kinda expected. The result was inverted compared to your binaries because tests at that point didn't expect FUM on failures and looks like you didn't build with updated EDK2 (old commit was likely still checkout out).

SergiiDmytruk avatar Sep 09 '24 15:09 SergiiDmytruk