rustBoot icon indicating copy to clipboard operation
rustBoot copied to clipboard

Damaging an update firmware makes rustboot hang at boot

Open lionelains opened this issue 1 year ago • 17 comments

When manually changing the firmware payload of a signed firmware, rustboot seems to upgrade flash to it but subsequent boots never process correctly.

Found in https://github.com/nihalpasham/rustBoot/commit/8437fd2a6ebf79d68a885da895e009fafccfccee

Scenario

Build the sample boot demo binaries from rustboot project:

  • a rustboot bootloader code
  • the initial firmware (on STM32 platforms, this is a green blinking firmware)
  • the upgrade firmware (on STM32 platforms, this is a red blinking firmware)

Before programming these 3 firmwares, we modify at least one byte in the 3rd binary (the upgrade firmware), and program the 3 firmwares on the board.

Expected

green blinking firmware boots and refuses to program the red firmware at all subsequent boots, only the green firmware boots

Observed

green blinking firmware boots the board reboots rustboot does not launch neither the red nor the green firmware and remains stuck in the boot phase resetting the board does not revert to the green firmware either, possibly resulting in a bricked device

Steps to reproduce on NUCLEO-H723ZG

Run the normal demo:

cargo stm32h723 build-sign-flash rustBoot 1234 1235

Watch the green firmware blink, then red firmware

Alter the red firmware (overwrite 4 bytes at offset 1024) :

cp ./boards/sign_images/signed_images/stm32h723_updtfw_v1235_signed.bin /tmp/stm32h723_updtfw_v1235_signed.altered.bin
printf '\xff\x00\xff\x00' | dd of=/tmp/stm32h723_updtfw_v1235_signed.altered.bin bs=1 seek=1024 count=4 conv=notrunc

Program again the two application firmwares:

probe-rs-cli erase --chip STM32H723ZGTx
probe-rs-cli download --format Bin --base-address 0x8020000 --chip STM32H723ZGTx ./boards/sign_images/signed_images/stm32h723_bootfw_v1234_signed.bin
probe-rs-cli download --format Bin --base-address 0x8060000 --chip STM32H723ZGTx /tmp/stm32h723_updtfw_v1235_signed.altered.bin

Program rustboot and run the demo.

probe-rs-cli run --chip STM32H723ZGTx ./boards/target/thumbv7em-none-eabihf/release/stm32h723

Observe the board first running the green firmware, then reboot and hang while in rustboot.

lionelains avatar Sep 05 '24 15:09 lionelains

When rust boot hangs, execution seems to hit the following panic(), which actually makes sense: https://github.com/nihalpasham/rustBoot/blob/8437fd2a6ebf79d68a885da895e009fafccfccee/rustBoot/src/image/image.rs#L572

lionelains avatar Sep 06 '24 07:09 lionelains

@imrank03 - would you be able to look into this?

nihalpasham avatar Sep 11 '24 04:09 nihalpasham

@imrank03 - would you be able to look at this?

Hi @nihalpasham, since the issue seems to be functioning as expected, I didn't look into it further.

imrank03 avatar Sep 12 '24 08:09 imrank03

@lionelains - I believe we are unable to recreate the above scenario. Can you help in recreating this bug?

nihalpasham avatar Sep 13 '24 02:09 nihalpasham

Closing. feel free re-open if needed

nihalpasham avatar Sep 21 '24 05:09 nihalpasham

I am going to give it a try again on my board. I'll keep you posted.

lionelains avatar Sep 26 '24 15:09 lionelains

I did reproduce on https://github.com/nihalpasham/rustBoot/commit/8437fd2a6ebf79d68a885da895e009fafccfccee and also on https://github.com/nihalpasham/rustBoot/commit/d4394d383ba3758574159c6630e4c3261a6b47f1

Same thing after a

rustup update

which leads to:

   stable-x86_64-unknown-linux-gnu updated - rustc 1.81.0 (eeb90cda1 2024-09-04) (from rustc 1.78.0 (9b00956e5 2024-04-29))
  nightly-x86_64-unknown-linux-gnu updated - rustc 1.83.0-nightly (9e394f551 2024-09-25) (from rustc 1.82.0-nightly (636d7ff91 2024-08-19))

I use the nightly toolchain to compile rustBoot (cargo +nightly stm32h723 build-sign-flash rustBoot 1234 1235).

In all these configurations, my board boots the green firmware, but never upgrades to the red (manually corrupted) firmware (as expected). However, the board never boots any firmware anymore, even after pressing the black reset button on the NUCLEO-H723ZG board.

lionelains avatar Sep 26 '24 15:09 lionelains

@imrank03

nihalpasham avatar Sep 27 '24 05:09 nihalpasham

@imrank03, on which board did you try to run the test? on a NUCLEO-H723ZG?

lionelains avatar Sep 27 '24 06:09 lionelains

On which board did you try to run the test? on a NUCLEO-H723ZG?

  • I tried on a NUCLEO-H723ZG.
  • And I am going to give it a try again on my board.

imrank03 avatar Sep 30 '24 06:09 imrank03

@lionelains, I was able to reproduce the issue, and it seems the problem is related to the boot-loader panicking. Here's the output from my command:

❯ probe-rs run --chip STM32H723ZGTx ./boards/target/thumbv7em-none-eabihf/release/stm32h723 
    Erasing ✔ [00:00:01] [##############################################] 128.00 KiB/128.00 KiB @ 67.63 KiB/s (eta 0s)
    Programming ✔ [00:00:01] [###############################################] 52.00 KiB/52.00 KiB @ 51.06 KiB/s (eta 0s)
    Finished in 2.922s
Frame 0: __bkpt @ 0x0800c3b2 inline
        ./asm/lib.rs:48:1
Frame 1: __bkpt @ 0x000000000800c3b2
        ./asm/lib.rs:51:17
Frame 2: bkpt @ 0x0800c38e inline
        /Users/IKL2KOR/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cortex-m-0.7.7/src/call_asm.rs:19:21
Frame 3: panic @ 0x000000000800c38a
        /Users/IKL2KOR/Imran_Bosch/rust_dev/rustBoot/boards/bootloaders/stm32h723/src/main.rs:21:9
Frame 4: panic_bounds_check @ 0x08000a82
        /rustc/798fb83f7d24e31b16acca113496f39ff168c143/library/core/src/panicking.rs:271:1

The CPU halts due to a panic in the bootloader code at main.rs:21:9. I’ll work on resolving the issue. In the meantime, if you manage to solve it, feel free to submit a PR.

imrank03 avatar Oct 17 '24 07:10 imrank03

Hi @imrank03 , yes, the panic implementation is indeed in main.rs. As far as I could diagnose, the root case actually comes from here: https://github.com/nihalpasham/rustBoot/blob/8437fd2a6ebf79d68a885da895e009fafccfccee/rustBoot/src/image/image.rs#L572 Changing it from a panic() into a returned Result<> would allow the caller to handle the case (in this specific scenario, that would probably mean ignore and continue booting from the other slot).

I will see if I can propose a refactor of the code following the strategy described above, but I am not going to be able to work on this during at least the coming two weeks.

lionelains avatar Oct 17 '24 10:10 lionelains

Hi @lionelains , thanks for the update and for pointing out the root cause in image.rs. Your suggestion to refactor the panic!() into a returned Result<> makes sense. Handling the integrity check failure in a more graceful way, like continuing booting from the other slot, seems like a good approach.

I understand you won’t be available for the next two weeks, so I’ll look into it and see if I can start implementing the changes in the meantime. Once you’re back, feel free to jump in as well.

Let me know if there's anything else I should consider while working on this!

imrank03 avatar Oct 18 '24 05:10 imrank03

Hello @imrank03, did you make any progress on this? If not, I can probably have a quick look next week, and probably even go further the following week.

lionelains avatar Nov 15 '24 09:11 lionelains

Hi @lionelains , I haven’t had the chance to look into this yet, unfortunately. If it’s not too much trouble, it would be great if you could take a look and create a PR.

imrank03 avatar Nov 17 '24 15:11 imrank03

I am preparing a PR here. It still needs review and maybe some rework.

lionelains avatar Nov 23 '24 15:11 lionelains

I am preparing a PR here. It still needs review and maybe some rework.

Sounds great! I’ll clone the PR, test it, and share my feedback. I have a tech summit this week, so I’ll aim to work on it over the weekend.

imrank03 avatar Nov 25 '24 04:11 imrank03