mcuboot
mcuboot copied to clipboard
If the image header is corrupt, bootutil_img_hash() can take a VERY long time
While testing repeated firmware updates, cycling between two versions, with a random power cycler, the system failed after about 3000 update operations.
The problem was in image_validate.c function bootutil_img_hash(), when the image header in flash was corrupt. The intention is that this will be detected by the checksum algorithm.
However, in this case, the field hdr->ih_img_size was a very large number, about 102 MB, rather than less than 1 MB (the size of the flash area). Instead of taking about 3 s to verify the checksum in this test system, it would have taken 5 minutes or more. In the worst case, if ih_img_size were all-1s (4 GB), it would take about 3.5 hours to verify the checksum.
A simple fix is to check that the size of the image being validated (and the protected TLVs, etc) is less than the size of the flash-area. This means that the checksum algorithm will take a reasonable duration (not more than the worst-case for that size flash area), rather than an arbitrary duration.
diff --git a/boot/bootutil/src/image_validate.c b/boot/bootutil/src/image_validate.c
index a697676b..e27f489e 100644
--- a/boot/bootutil/src/image_validate.c
+++ b/boot/bootutil/src/image_validate.c
@@ -117,6 +117,10 @@ bootutil_img_hash(struct enc_key_data *enc_state, int image_index,
/* If protected TLVs are present they are also hashed. */
size += hdr->ih_protect_tlv_size;
+ if (size > fap->fa_size) {
+ return BOOT_EBADIMAGE;
+ }
+
#ifdef MCUBOOT_RAM_LOAD
bootutil_sha_update(&sha_ctx,
(void*)(IMAGE_RAM_BASE + hdr->ih_load_addr),
MCUboot version: branch v2.1.0, git d4394c2f9b76e0a7b758441cea3a8ceb896f66c8
The image header became corrupt when the power-cycle happened while the application was writing the new version header to flash, but before the flash device completed the operation. The application does verify the new image version in flash before rebooting, but did not have time to do that before the power-cycle.
Rather than failing if size is too big, another option would be to set size to fap->fa_size (the size of the partition). This will still fail; it will take longer because it has to verify the checksum.
I'm a little surprised this doesn't just fail with the first flash read past the end of the partition. Does the driver not reject the flash operations that are out of bound? I believe that was the assumption in how this was written.
It didn't even occur to me that the non-library flash-driver code should return a failure code for writing outside the flash-area. I assumed that the library code would do that.
It seems that I based my flash-area code on some of the examples where the out-of-bounds check is not explicit, done by some board-specific library code (cypress, mbed, zephyr), rather than those examples that do have explicit checks (esp32, nuttx)
It is easy enough to add the appropriate check to my implementation of flash_area_xxxx() operations and return an error for out-of-bounds access.
In fact, I am more surprised that boot_image_load_header() does not pick up the corrupt size: it has an explicit check for size being too big. I will need to do some more investigation.
UPDATE: Nothing in the library code calls boot_image_load_header().
I do like the idea of bootutil_img_validate() being the thing that looks for and finds for an invalid image, rather than relying on the flash driver, given that
boot_is_header_valid()exists and checks the sizeboot_image_load_header()exists and checks the size
Perhaps bootutil_img_validate() should call boot_is_header_valid()?
My provided patch is poor, in that it reads fap->fa_size directly, rather than calling flash_area_get_size().
+1 to adding the check that size in the image header doesn't exceed flash area size in mcuboot, rather than relying on the flash driver to detect the out of bounds access.
I ran into this when porting mcuboot to a custom platform, except I had an assertion check rather than just an error return in the flash driver, since I assumed mcuboot wouldn't try to read/write/erase outside of the defined flash areas.
I have just fallen over something very like this (in 2.1.0) while doing a custom port (in my case, a mistake on my part has lead to hdr in bootutil_img_validate not pointing to a real header at all.
I was a bit surprised that mcuboot apparently hasn't checked hdr->ih_magic is correct (it isn't, in my case) before trusting the rest of the structure enough to go off and start calculating a hash of the image.
The code in boot_validate_slot looks like this:
BOOT_HOOK_CALL_FIH(boot_image_check_hook, FIH_BOOT_HOOK_REGULAR,
fih_rc, BOOT_CURR_IMG(state), slot);
if (FIH_EQ(fih_rc, FIH_BOOT_HOOK_REGULAR))
{
FIH_CALL(boot_image_check, fih_rc, state, hdr, fap, bs);
}
if (!boot_is_header_valid(hdr, fap) || FIH_NOT_EQ(fih_rc, FIH_SUCCESS)) {
if ((slot != BOOT_PRIMARY_SLOT) || ARE_SLOTS_EQUIVALENT()) {
flash_area_erase(fap, 0, flash_area_get_size(fap));
/* Image is invalid, erase it to prevent further unnecessary
* attempts to validate and boot it.
*/
}
...
i.e. the call to boot_is_header_valid occurs AFTER the hashing in boot_image_check, which is possibly too late.
This feels like significant bricking-risk to me - in that an unlucky (or malicious) header could cause the boot_image_check to run wild, possibly crashing the machine (depending on how the flash drivers respond to wild addresses).
There's another, earlier bit of code in boot_validate_slot:
hdr = boot_img_hdr(state, slot);
if (boot_check_header_erased(state, slot) == 0 ||
(hdr->ih_flags & IMAGE_F_NON_BOOTABLE)) {
which trusts the value of hdr->ih_flags even though hdr is as-yet unvalidated - that feels like a relative of the same problem.
I wondered it if would be safer for boot_header_is_valid to be called in boot_read_image_headers ? That would mean that badly malformed headers didn't silently get stored in the BOOT_IMG array, giving later users the impression that they're more trustworthy than they actually are.
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.