esp-hal Take 2 on `#[ram]` soundness

trafficstars

Submission Checklist 📝

[x] I have updated existing examples or added new ones (if applicable).
[x] I have used cargo xtask fmt-packages command to ensure that all changed code is formatted correctly.
[ ] My changes were added to the CHANGELOG.md in the proper section.
[x] My changes are in accordance to the esp-rs API guidelines

Extra:

[x] I have read the CONTRIBUTING.md guide and followed its instructions.

Pull Request Details 📖

Description

Closes #1110, closes #1650, closes #1649

New PR as this is a full rewrite. It also fixes #1650 since both #[ram(persistent)] and #[ram(zeroed)] require the same check that the type can take an all-zero bit pattern. I used bytemuck::Zeroable since it seems like the most widely used, rather than implement. It did mean adding bytemuck as a dependency of esp-hal, but I used version 1.0.0 to maximize compatibility with whatever version is probably already in users' dependency graphs.

Error message for usage on non-Zeroable types

error[E0277]: the trait bound `NonZero<u8>: bytemuck::zeroable::Zeroable` is not satisfied
   --> src/bin/ram.rs:40:18
    |
40  | static NON_ZERO: NonZeroU8 = NonZeroU8::MIN;
    |                  ^^^^^^^^^ the trait `bytemuck::zeroable::Zeroable` is not implemented for `NonZero<u8>`
    |
    = help: the following other types implement trait `bytemuck::zeroable::Zeroable`:
              bool
              char
              isize
              i8
              i16
              i32
              i64
              i128
            and 73 others
note: required by a bound in `assert_is_zeroable`
   --> …/esp-hal/esp-hal/src/lib.rs:229:40
    |
229 |     pub const fn assert_is_zeroable<T: bytemuck::Zeroable>() {}
    |                                        ^^^^^^^^^^^^^^^^^^ required by this bound in `assert_is_zeroable`

For more information about this error, try `rustc --explain E0277`.

If exposing the #[doc(hidden)] function name in the error message is undesirable, I use the method I did in #1649.

Testing

The updated ram example ran on an S3 dev board. I do not have any of the RISC-V ESPs, so someone else will need to test that once it's implemented.

Additional To-dos

[x] Update esp-riscv-rt

Jun 14 '24 02:06 wetheredge

It will now only zero persistent ram after initial boot, since, I think[^1] that's the only time the RTC ram is not preserved. IMO, this should be expanded to all the reset reasons that could theoretically happen before the init had finished. It seems to me that that would be anything not caused by a watchdog or software (for S3: brown out, clock glitch, efuse error, and usb uart/jtag resets).

I also updated esp-riscv-rt to match the xtensa behavior. I don't have any risc-v esps to test on, but it did seem to work for esp32c3 in qemu.

Additionally, I noticed the rv-init-data and rv-init-rtc-data features (and the esp-riscv-rt features they enable) seem to have only been used for the old direct boot support. I also attempted to find uses of them in a GitHub-wide search and did not see any outside of forks of esp-hal. How do you feel about removing these features and perhaps merging the existing zero-rtc-fast-bss and new zero-rtc-fast-persistent into one rtc-ram feature?

[^1]: Based on the Reset and Clock chapter of the ESP32, ESP32-S3, and ESP32-C3 reference manuals

Jun 22 '24 00:06 wetheredge

Additionally, I noticed the rv-init-data and rv-init-rtc-data features (and the esp-riscv-rt features they enable) seem to have only been used for the old direct boot support. I also attempted to find uses of them in a GitHub-wide search and did not see any outside of forks of esp-hal. How do you feel about removing these features and perhaps merging the existing zero-rtc-fast-bss and new zero-rtc-fast-persistent into one rtc-ram feature?

I tend to agree - maybe good to hear others opinion on that, too

Jun 24 '24 08:06 bjoernQ

Looks quite good overall - I guess after adding a CHANGELOG.md entry this should be good to go

One thing maybe worth considering would be to have a (doc-hidden) function in esp-hal like should_reset_rtc_persisted_ram (naming is debatable) which decides on the reset reasons for both - Xtensa and RISC-V. Would make it easier to change the decision when to zero memory (if we need to reconsider the condition). Not mandatory - just an idea

Jun 25 '24 08:06 bjoernQ

Great. My last concern is with the edge case of an externally triggered RTC RAM-preserving reset (brown out, power glitch, jtag reset, etc.) occurring very early in the first boot such that the zero initialization gets skipped, causing undefined behavior. If this is a worry, which kinds of resets should be added to the list to trigger zeroing?

I wrote the condition in assembly to match the rest of the risc-v init, since I'm not sure how it interacts with the soundness issues that prompted the decision to avoid Rust there.

Jun 26 '24 02:06 wetheredge

Probably all the reset reasons which are chip-reset or system-reset should always trigger zeroing:

(from C6 TRM)

I wrote the condition in assembly to match the rest of the risc-v init, since I'm not sure how it interacts with the soundness issues that prompted the decision to avoid Rust there.

Yes - makes sense. In theory when being cautious writing such a function it should be okay but sure - in assembly it's easier to know what really happens 👍

Jun 26 '24 06:06 bjoernQ

If that looks good, I'll update the changelog and rebase.

The risc-v code could also use some testing on real chips—I've only been able to test on an S3 and qemu.

Jul 01 '24 05:07 wetheredge

Seems like the description in the TRM fooled me a bit and "System Reset: resets the whole digital system, including LP system." doesn't mean LP/RTC RAM is reset. So only "Chip Reset: resets the whole chip." seems to be a reason to zero the memory. (Sorry for the confusion)

Maybe we should just assume "Brown-out system reset" is "Chip Reset" (it can also be System Reset according to the TRM) and only zero out memory for these two reasons

Jul 01 '24 08:07 bjoernQ

I interpreted that to mean that a minor brown out could cause a system reset (preserving RTC RAM) and would give a reset code of 0x0F, while a more severe brownout could cause a chip reset with code 0x01. If it could give code 0x0F even after causing a chip reset, then I agree—definitely need to init after that.

I had expanded the list to guard against something like this:

Initial boot starts (reset reason = ChipPowerOn)
While the bootloader is running, something triggers a reset
Second boot starts and completes with a non-ChipPowerOn reset reason, meaning persistent ram is not zeroed
Undefined behavior

I am unsure which resets could occur at step 2, hence the long list. Perhaps anything that could trigger a reset that early would result in ChipPowerOn? Would a JTAG reset be possible that early?

Perhaps the time between a reset that could cause the above sequence becoming possible and the ram init finishing is simply so short that it should not be considered. I personally dislike that solution as a soundness-purist, but I'll follow your lead there.

Jul 03 '24 02:07 wetheredge

t.b.h. I guess the above scenario is not completely impossible in a development setup but maybe unlikely in production - not too sure

At least not zeroing on e.g. WDT resets can be useful, I guess. On the other hand, if code is manipulating data stored in RTC RAM and gets interrupted by a WDT reset (or any other data preserving reset) the data might be invalid then. I start to wonder if there would be a 100% solution other than e.g. using checksums?

Jul 03 '24 08:07 bjoernQ

Oh yeah, interrupted writes are an issue.

What about this?

Update the docs to warn about the potential for resets during writes and to avoid resets immediately after first boot
Require persistent statics to impl AnyBitPattern
Zero after ChipPowerOn and SysBrownOut
- Maybe also some set of SysClkGlitch, CoreEfuseCrc, and/or CorePwrGlitch?

Also, from reading this ESP-IDF source, it looks like some cases of brownout resets are detected by the IDF, so esp-hal will report them as CoreSW. Could those be the system brown-out resets described in the TRM?

Shall I split the fix for #1650 out into its own PR so that isn't held up by finalizing these specifics?

Jul 04 '24 05:07 wetheredge

Update the docs to warn about the potential for resets during writes and to avoid resets immediately after first boot

Yes, we probably should explain these things more

Require persistent statics to impl AnyBitPattern

That is basically just MaybeUninit, right? But I agree it's probably the only safe thing to do then

Zero after ChipPowerOn and SysBrownOut

Yes, sounds good.

Regarding splitting the PR: Ideally, I would love to see all of this in the next release - not sure if splitting out parts of the PR helps or causes more work for you. I'm fine with both I guess

Jul 04 '24 06:07 bjoernQ

I noticed this note on SocResetReason::ChipResetReason:

In ESP-IDF this value (0x01) can also be ChipBrownOut or ChipSuperWdt, however that is not really compatible with Rust-style enums.

Combined with this part of the IDF, I think that a chip level brownout will not give SysBrownOut. Based on this assumption, I'm only initializing after ChipPowerOn.

That is basically just MaybeUninit, right?

I guess it's close, yeah. But this can be used without any unsafe:

#[ram(rtc_fast, persistent)]
static BOOT_MODE: AtomicU8 = AtomicU8::new(0);

match BOOT_MODE.load(order) {
	// ...
}

Jul 06 '24 19:07 wetheredge

Well, I missed the note in more recent versions of the AnyBitPattern docs that explicitly forbids interior mutability because of bytemuck::cast_ref, etc. I've replaced it with a new esp_hal::Persistable trait that matches the actual requirements for #[ram(persistent)]. It's implemented for a minimal set of integers, floats, portable_atomic::Atomic*, and arrays that should cover most usage.

Jul 06 '24 23:07 wetheredge

Looks good but I'm on vacation this week -not sure if and when I will get to this before next week. So, if anyone likes to review this: feel free

Jul 08 '24 08:07 bjoernQ

esp-hal esp-hal copied to clipboard

Take 2 on `#[ram]` soundness

Submission Checklist 📝

Extra:

Pull Request Details 📖

Description

Testing

Additional To-dos

esp-hal
esp-hal copied to clipboard