mcuboot
mcuboot copied to clipboard
boot: bootutil: swap-move: Allow trailer area not be a multiple of the sector size
When using swap-move, the area allocated to the trailer has currently to be a multiple of the sector size. This is not a big deal on devices having small sectors, but on devices having a large sector size, such most STM32 MCUs, this typically means losing a significant amount of flash memory. For example, on an STM32F4 device with 128-KiB sectors, the trailer is usually only a few hundreds of bytes long but a whole 128 KiB has still to be reserved for storing the trailer and can't therefore be used for firmware data.
The idea of this PR is to relax the need of having a sector-aligned trailer area, enabling therefore to use all the bytes not needed by the trailer, in the first sector holding trailer data, to store firmware data.
To illustrate the solution, let's imagine a device having 4096-byte sectors and a primary slot composed of N sectors. Let's assume the trailer needs 4224 bytes. Before this MR, two entire sectors would have to be allocated to the trailer, so the layout would have been something like:
PRIMARY SECONDARY
| ... | | ... |
+-----------------------+ +-----------------------+
| Firmware (4096 bytes) | Sector N-3 | Firmware (4096 bytes) | Sector N-3
| . | | . |
+-----------------------+ +-----------------------+
| Padding (4096 bytes) | Sector N-2 | Trailer (4096 bytes) | Sector N-2
| . | | . |
+-----------------------+ +-----------------------+
| Trailer (4096 bytes) | Sector N-1 | Trailer (4096 bytes) | Sector N-1
| | | . |
+-----------------------+ +-----------------------+
| Trailer (4096 bytes) | Sector N
| . |
+-----------------------+
With this MR, only the exact amount of bytes actually needed by the trailer have to be allocated to the trailer, so we have 3968 additional bytes available to store the firmware image:
PRIMARY SECONDARY
| ... | | ... |
+-----------------------+ +-----------------------+
| Firmware (4096 bytes) | Sector N-3 | Firmware (4096 bytes) | Sector N-3
| . | | . |
+-----------------------+ +-----------------------+
| Firmware (3968 bytes) | Sector N-2 | Firmware (3968 bytes) | Sector N-2
| Padding (128 bytes) | | Trailer (128 bytes) |
+-----------------------+ +-----------------------+
| Padding (3968 bytes) | Sector N-1 | Trailer (4096 bytes) | Sector N-1
| Trailer (128 bytes) | | . |
+-----------------------+ +-----------------------+
| Trailer (4096 bytes) | Sector N
| . |
+-----------------------+
At the beginning of an upgrade or revert process, all the sectors in the primary slot are "moved up", ending up with:
PRIMARY SECONDARY
| ... | | ... |
+-----------------------+ +-----------------------+
| Firmware (4096 bytes) | Sector N-3 | Firmware (4096 bytes) | Sector N-3
| . | | . |
+-----------------------+ +-----------------------+
| Firmware (4096 bytes) | Sector N-2 | Firmware (3968 bytes) | Sector N-2
| . | | Trailer (128 bytes) |
+-----------------------+ +-----------------------+
| Firmware (3968 bytes) | Sector N-1 | Trailer (4096 bytes) | Sector N-1
| Trailer (128 bytes) | | . |
+-----------------------+ +-----------------------+
| Trailer (4096 bytes) | Sector N
| . |
+-----------------------+
To main ideas of the solution implemented by this PR is:
- To copy only the part used by the firmware image when moving up the last sector (sector N-2 in the example), to avoid overwriting the trailer data.
- To erase the secondary slot's trailer when swapping the last sector holding firmware data in the secondary slot, to ensure no firmware data will be lost when erasing the secondary trailer. Previously the erasure was performed at the beginning of the move process, just after the erasure of the primary trailer, by simplicity.
- To avoid the need of rewriting the secondary slot's trailer when starting a revert process.
The latter is the most difficult part. Indeed, at the beginning of the revert process, the secondary trailer was rewritten to make the revert look like a permanent upgrade in case an unfortunate reset occurs during the rewriting of the primary trailer. This was a clever trick, but is unfortunately no longer possible if the trailers are not stored in dedicated sectors. The solution proposed by this PR is to write a "fallback trailer" composed only of a magic number in the primary slot, during the upgrade process, at the end of the sector that is just before the first sector containing part of the slot's trailer. This area is available at the end of the upgrade process since it is part of the padding area, used to "move the sectors up". The maximum image size is adjusted, by adding at most 8 bytes of padding, to ensure the area is always large enough to write the magic number. In case a reset occurs while rewriting the primary trailer during a revert process, the presence of the fallback trailer can be used to determine a revert was in progress.
If the improvement proposed by this MR is merged, I will try to port those changes to the swap-offset strategy, which should be quite easy, at least in theory.
Resolves #2052
The problem with this is that it introduces the inability to clear flags e.g. if a firmware update is marked for confirmation or test, once a flag is set it cannot be cleared without potentially bricking the device if power is lost during erase/re-write of that sector
The problem with this is that it introduces the inability to clear flags e.g. if a firmware update is marked for confirmation or test, once a flag is set it cannot be cleared without potentially bricking the device if power is lost during erase/re-write of that sector
@nordicjm Have you a specific scenario in mind where there would be an issue?
-
Before an upgrade the user can erase the secondary trailer before downloading an upgrade and then write it to make the upgrade pending without risking bricking the device.
-
During an upgrade or revert process, the primary and secondary trailers are rewritten but the logic makes sure the process is resumed gracefully on boot in case of a power loss, and that without risking losing part of the firmware image.
-
After an upgrade, the user might have to set the
image-okflag in the primary trailer (which is possible without erasing the trailer) but never has to clear a flag in that trailer, unless I'm missing something.
Also note that the swap-scratch strategy already supports having firmware and trailer data in the same sector and this doesn't seem to cause any issue, so I don't think this should be a problem for swap-move.
User uploads image, they mark it for test/swap (which writes the flag to the primary image, not the secondary), then they delete the secondary image without rebooting, the update image is gone, but the flag is permanently set to test/confirm
User uploads image, they mark it for test/swap (which writes the flag to the primary image, not the secondary), then they delete the secondary image without rebooting, the update image is gone, but the flag is permanently set to test/confirm
@nordicjm Perhaps you're talking about a specific MCUboot feature I'm not aware of, but the usual flow is:
- Erasing the secondary slot.
- Writing the update image to the secondary slot.
- Marking the update image as pending, which writes the secondary slot's trailer. To do a permanent upgrade the
image-okflag is set in the secondary slot, otherwise only the magic is written. Nothing is written to the primary slot, please look at the implementation ofboot_set_pending_multi, in particular: https://github.com/mcu-tools/mcuboot/blob/ed434f33d91a5f215cd318fc3dbdcba55f1151c7/boot/bootutil/src/bootutil_public.c#L689-L694 - At this point, if the user wants to delete the update image before rebooting for some reason, they can do that without any problem and no update will be triggered. There is also no risk of bricking the device, even in the case of a power loss, because erasing only one bit of the image of course prevents the upgrade from occurring because of the hash/signature check, which is performed before attempting an upgrade.
The solution that was initially implemented to avoid the need of rewriting the secondary slot's trailer when starting a revert process was simple but had the drawback of requiring the trailer in the secondary slot to be valid after the completion of an upgrade. This meant a few changes in the swap tables were required to handle this swap-move-specific state, but also that it was no longer possible for a user to set again the secondary image as pending after an upgrade or revert process, since the secondary trailer wasn't erased anymore and cannot be rewritten without risking to lose part of the firmware. I'm not sure this is a very typical use case, but that might still be useful to some users, e.g. to perform a rollback after an update has been confirmed.
So, I wasn't fully satisfied with that solution. I updated this MR with another solution, which is bit more complex but doesn't require anymore the secondary trailer to be valid after an upgrade process. Instead, a "fallback trailer" is written in the primary slot, in a sector that doesn't contain any bit of the primary slot's trailer to ensure it is not erased when the primary trailer is erased. This fallback trailer is used to determine a revert process was in progress in case a reboot occurs while rewriting the primary trailer. I updated the PR description accordingly and the details can be found in the commit messages.
This will enable the sim to run these tests on the stm32f4 device:
diff --git a/sim/src/image.rs b/sim/src/image.rs
index 09214c50..ed72d2e5 100644
--- a/sim/src/image.rs
+++ b/sim/src/image.rs
@@ -451,7 +451,7 @@ impl ImagesBuilder {
let mut flash = SimMultiFlash::new();
flash.insert(dev_id, dev);
- (flash, Rc::new(areadesc), &[Caps::SwapUsingMove, Caps::SwapUsingOffset])
+ (flash, Rc::new(areadesc), &[])
}
DeviceName::Stm32f4SpiFlash => {
// STM style internal flash and external SPI flash.
To run this on the simulator, cd sim, and then run something like this:
cargo test \
--features "swap-move" \
-- --nocapture --test-threads 1
The features sets what will be tested, in this case just swap move, "sig-ecdsa-mbedtls swap-move" for example will test with ecdsa signatures, and swap move.
You can also add arguments at the end to restrict the particular test you want to run.
@d3zd3z Thanks for your feedback! I'm doing nearly all my testing with the simulator, which is a lot easier and faster than setting up a real device, in addition to be more comprehensive as you said.
Your patch isn't sufficient to run swap-move with the STM32F4 device defined in the simulator, as the slots contain sectors with different sizes (which is of course not supported by swap-move, even with my changes). Also, when using swap-move or swap-offset, the simulator currently assumes all of the sectors of the flash memory have the same size, e.g. here: https://github.com/mcu-tools/mcuboot/blob/6cbea0a242561bc31cf85d49c5e9c9b8df276b83/sim/src/image.rs#L1810-L1812
So to be able to run swap-move on the STM32F4 device, you need:
diff --git a/sim/src/image.rs b/sim/src/image.rs
index 09214c50..b00c7e96 100644
--- a/sim/src/image.rs
+++ b/sim/src/image.rs
@@ -437,9 +437,9 @@ impl ImagesBuilder {
// The flash layout as described is not present in any real STM32F4 device, but it
// serves to exercise support for sectors of varying sizes inside a single slot,
// as long as they are compatible in both slots and all fit in the scratch.
- let dev = SimFlash::new(vec![16 * 1024, 16 * 1024, 16 * 1024, 16 * 1024, 64 * 1024,
- 32 * 1024, 32 * 1024, 64 * 1024,
- 32 * 1024, 32 * 1024, 64 * 1024,
+ let dev = SimFlash::new(vec![64 * 1024, 64 * 1024,
+ 64 * 1024, 64 * 1024,
+ 64 * 1024, 64 * 1024,
128 * 1024],
align as usize, erased_val);
let dev_id = 0;
@@ -451,7 +451,7 @@ impl ImagesBuilder {
let mut flash = SimMultiFlash::new();
flash.insert(dev_id, dev);
- (flash, Rc::new(areadesc), &[Caps::SwapUsingMove, Caps::SwapUsingOffset])
+ (flash, Rc::new(areadesc), &[])
}
DeviceName::Stm32f4SpiFlash => {
// STM style internal flash and external SPI flash.
With those changes, nearly all the tests are OK and in particular the perm_with_fails and revert_with_fails tests. However, I get an error in status_write_fails_with_reset when validate-primary-slot is selected:
core-0fe4d46a1f7a85be: ../../boot/bootutil/src/loader.c:1820: boot_swap_image: Assertion `rc == 0' failed.
error: test failed, to rerun pass `-p bootsim --test core`
Not sure exactly what's going on here, I don't know the purpose of status_write_fails_with_reset. I will investigate in the following days :)
So I've been able to look a bit further. If my understanding is correct, status_write_fails_with_reset simulates an upgrade with errors occurring while writing to the status area, meaning if the upgrade process is interrupted, the status read by MCUboot at next boot won't be valid. When run on the simulated STM32F4 device, with two 64-KiB sectors in each slot, the initial state is:
PRIMARY SECONDARY
| ... | | ... |
+-----------------------+ +-----------------------+
| Image N (~63 KiB) | Sector 0 | Image N+1 (~63 KiB) |
| Padding (~1 KiB) | | Padding (~1 KiB) |
+-----------------------+ +-----------------------+
| Padding (~63 KiB) | Sector 1 | Padding (~63 KiB) |
| Erased (~1 KiB) | | Trailer (~1 KiB) |
+-----------------------+ +-----------------------+
The status_write_fails_with_reset runs the upgrade and interrupts it arbitrarily in the middle of the process. At this point, the slots are being swapped and the sector 0 of the primary slot has just been erased and has been partly rewritten with the sector 0 of the secondary slot:
PRIMARY SECONDARY
| ... | | ... |
+-----------------------+ +-----------------------+
| Image N+1 (~30 KiB) | Sector 0 | Image N+1 (~63 KiB) |
| Erased (~34 KiB) | | Padding (~1 KiB) |
+-----------------------+ +-----------------------+
| Image N (~63 KiB) | Sector 1 | Padding (~63 KiB) |
| Trailer (~1 KiB) | | Trailer (~1 KiB) |
+-----------------------+ +-----------------------+
At next boot, MCUboot tries to resume the upgrade but since the boot status has not been properly written, MCUboot thinks the upgrade hasn't started yet so tries to compute the size of the primary image: https://github.com/mcu-tools/mcuboot/blob/6cbea0a242561bc31cf85d49c5e9c9b8df276b83/boot/bootutil/src/loader.c#L1818-L1822
Of course, this fails because the TLV area of the (new) primary image hasn't been written yet, so boot_read_image_size returns an error and we end up in the assert at the next line. This assert isn't caught by the simulator (the simulator only catches ASSERT, not assert), so the simulator crashes.
From my point of view, this is not an MCUboot issue, but more a simulator-related issue. Indeed, from what I understand, when MCUBOOT_VALIDATE_PRIMARY_SLOT is enabled (and only in that case), the test case tries to resume the upgrade, as described above, and then expect at most one assertion to be thrown:
https://github.com/mcu-tools/mcuboot/blob/6cbea0a242561bc31cf85d49c5e9c9b8df276b83/sim/src/image.rs#L1212-L1220
So it is expected that either:
- The swap completes without any error. This happens if the last status word is properly written.
- The swap completes but the validation of the primary slot fails. This can happen if the last status hasn't been properly written. In that case, the test expects an assert to be triggered. It seems at the time the test was written an
ASSERT(0);was performed when the primary slot validation fails. However, this is no more the case.
But in fact, there is another valid case that isn't handled by this test case: if no status word has been properly written, MCUboot thinks the upgrade hasn't started. In that case, there can be zero, one or multiple asserts depending on the current state of the slots (the primary and secondary headers might be valid or not, the size of the TLV area might be found or not, the crypto keys might be found in the TLV area or not, ...).
@d3zd3z Is my understanding correct? If so, does it really make sense not to expect any assert, except one coming from the primary slot validation, when trying to resume the upgrade in status_write_fails_with_reset with a partially valid status? As a user, the only thing I would expect from MCUboot in that situation is not to boot the corrupted image, which can happen either because the bootloader crashes in an assert or because the primary slot validation fails.
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.