esp-hal
esp-hal copied to clipboard
Take 2 on `#[ram]` soundness
Submission Checklist 📝
- [x] I have updated existing examples or added new ones (if applicable).
- [x] I have used
cargo xtask fmt-packagescommand to ensure that all changed code is formatted correctly. - [ ] My changes were added to the
CHANGELOG.mdin the proper section. - [x] My changes are in accordance to the esp-rs API guidelines
Extra:
- [x] I have read the CONTRIBUTING.md guide and followed its instructions.
Pull Request Details 📖
Description
Closes #1110, closes #1650, closes #1649
New PR as this is a full rewrite. It also fixes #1650 since both #[ram(persistent)] and #[ram(zeroed)] require the same check that the type can take an all-zero bit pattern. I used bytemuck::Zeroable since it seems like the most widely used, rather than implement. It did mean adding bytemuck as a dependency of esp-hal, but I used version 1.0.0 to maximize compatibility with whatever version is probably already in users' dependency graphs.
Error message for usage on non-Zeroable types
error[E0277]: the trait bound `NonZero<u8>: bytemuck::zeroable::Zeroable` is not satisfied
--> src/bin/ram.rs:40:18
|
40 | static NON_ZERO: NonZeroU8 = NonZeroU8::MIN;
| ^^^^^^^^^ the trait `bytemuck::zeroable::Zeroable` is not implemented for `NonZero<u8>`
|
= help: the following other types implement trait `bytemuck::zeroable::Zeroable`:
bool
char
isize
i8
i16
i32
i64
i128
and 73 others
note: required by a bound in `assert_is_zeroable`
--> …/esp-hal/esp-hal/src/lib.rs:229:40
|
229 | pub const fn assert_is_zeroable<T: bytemuck::Zeroable>() {}
| ^^^^^^^^^^^^^^^^^^ required by this bound in `assert_is_zeroable`
For more information about this error, try `rustc --explain E0277`.
If exposing the #[doc(hidden)] function name in the error message is undesirable, I use the method I did in #1649.
Testing
The updated ram example ran on an S3 dev board. I do not have any of the RISC-V ESPs, so someone else will need to test that once it's implemented.
Additional To-dos
- [x] Update
esp-riscv-rt
It will now only zero persistent ram after initial boot, since, I think[^1] that's the only time the RTC ram is not preserved. IMO, this should be expanded to all the reset reasons that could theoretically happen before the init had finished. It seems to me that that would be anything not caused by a watchdog or software (for S3: brown out, clock glitch, efuse error, and usb uart/jtag resets).
I also updated esp-riscv-rt to match the xtensa behavior. I don't have any risc-v esps to test on, but it did seem to work for esp32c3 in qemu.
Additionally, I noticed the rv-init-data and rv-init-rtc-data features (and the esp-riscv-rt features they enable) seem to have only been used for the old direct boot support. I also attempted to find uses of them in a GitHub-wide search and did not see any outside of forks of esp-hal. How do you feel about removing these features and perhaps merging the existing zero-rtc-fast-bss and new zero-rtc-fast-persistent into one rtc-ram feature?
[^1]: Based on the Reset and Clock chapter of the ESP32, ESP32-S3, and ESP32-C3 reference manuals
Additionally, I noticed the rv-init-data and rv-init-rtc-data features (and the esp-riscv-rt features they enable) seem to have only been used for the old direct boot support. I also attempted to find uses of them in a GitHub-wide search and did not see any outside of forks of esp-hal. How do you feel about removing these features and perhaps merging the existing zero-rtc-fast-bss and new zero-rtc-fast-persistent into one rtc-ram feature?
I tend to agree - maybe good to hear others opinion on that, too
Looks quite good overall - I guess after adding a CHANGELOG.md entry this should be good to go
One thing maybe worth considering would be to have a (doc-hidden) function in esp-hal like should_reset_rtc_persisted_ram (naming is debatable) which decides on the reset reasons for both - Xtensa and RISC-V. Would make it easier to change the decision when to zero memory (if we need to reconsider the condition). Not mandatory - just an idea
Great. My last concern is with the edge case of an externally triggered RTC RAM-preserving reset (brown out, power glitch, jtag reset, etc.) occurring very early in the first boot such that the zero initialization gets skipped, causing undefined behavior. If this is a worry, which kinds of resets should be added to the list to trigger zeroing?
I wrote the condition in assembly to match the rest of the risc-v init, since I'm not sure how it interacts with the soundness issues that prompted the decision to avoid Rust there.
Probably all the reset reasons which are chip-reset or system-reset should always trigger zeroing:
(from C6 TRM)
I wrote the condition in assembly to match the rest of the risc-v init, since I'm not sure how it interacts with the soundness issues that prompted the decision to avoid Rust there.
Yes - makes sense. In theory when being cautious writing such a function it should be okay but sure - in assembly it's easier to know what really happens 👍
If that looks good, I'll update the changelog and rebase.
The risc-v code could also use some testing on real chips—I've only been able to test on an S3 and qemu.
Seems like the description in the TRM fooled me a bit and "System Reset: resets the whole digital system, including LP system." doesn't mean LP/RTC RAM is reset. So only "Chip Reset: resets the whole chip." seems to be a reason to zero the memory. (Sorry for the confusion)
Maybe we should just assume "Brown-out system reset" is "Chip Reset" (it can also be System Reset according to the TRM) and only zero out memory for these two reasons
I interpreted that to mean that a minor brown out could cause a system reset (preserving RTC RAM) and would give a reset code of 0x0F, while a more severe brownout could cause a chip reset with code 0x01. If it could give code 0x0F even after causing a chip reset, then I agree—definitely need to init after that.
I had expanded the list to guard against something like this:
- Initial boot starts (reset reason =
ChipPowerOn) - While the bootloader is running, something triggers a reset
- Second boot starts and completes with a non-
ChipPowerOnreset reason, meaning persistent ram is not zeroed - Undefined behavior
I am unsure which resets could occur at step 2, hence the long list. Perhaps anything that could trigger a reset that early would result in ChipPowerOn? Would a JTAG reset be possible that early?
Perhaps the time between a reset that could cause the above sequence becoming possible and the ram init finishing is simply so short that it should not be considered. I personally dislike that solution as a soundness-purist, but I'll follow your lead there.
t.b.h. I guess the above scenario is not completely impossible in a development setup but maybe unlikely in production - not too sure
At least not zeroing on e.g. WDT resets can be useful, I guess. On the other hand, if code is manipulating data stored in RTC RAM and gets interrupted by a WDT reset (or any other data preserving reset) the data might be invalid then. I start to wonder if there would be a 100% solution other than e.g. using checksums?
Oh yeah, interrupted writes are an issue.
What about this?
- Update the docs to warn about the potential for resets during writes and to avoid resets immediately after first boot
- Require
persistentstatics to implAnyBitPattern - Zero after
ChipPowerOnandSysBrownOut- Maybe also some set of
SysClkGlitch,CoreEfuseCrc, and/orCorePwrGlitch?
- Maybe also some set of
Also, from reading this ESP-IDF source, it looks like some cases of brownout resets are detected by the IDF, so esp-hal will report them as CoreSW. Could those be the system brown-out resets described in the TRM?
Shall I split the fix for #1650 out into its own PR so that isn't held up by finalizing these specifics?
Update the docs to warn about the potential for resets during writes and to avoid resets immediately after first boot
Yes, we probably should explain these things more
Require persistent statics to impl AnyBitPattern
That is basically just MaybeUninit, right? But I agree it's probably the only safe thing to do then
Zero after ChipPowerOn and SysBrownOut
Yes, sounds good.
Regarding splitting the PR: Ideally, I would love to see all of this in the next release - not sure if splitting out parts of the PR helps or causes more work for you. I'm fine with both I guess
I noticed this note on SocResetReason::ChipResetReason:
In ESP-IDF this value (0x01) can also be
ChipBrownOutorChipSuperWdt, however that is not really compatible with Rust-style enums.
Combined with this part of the IDF, I think that a chip level brownout will not give SysBrownOut. Based on this assumption, I'm only initializing after ChipPowerOn.
That is basically just
MaybeUninit, right?
I guess it's close, yeah. But this can be used without any unsafe:
#[ram(rtc_fast, persistent)]
static BOOT_MODE: AtomicU8 = AtomicU8::new(0);
match BOOT_MODE.load(order) {
// ...
}
Well, I missed the note in more recent versions of the AnyBitPattern docs that explicitly forbids interior mutability because of bytemuck::cast_ref, etc. I've replaced it with a new esp_hal::Persistable trait that matches the actual requirements for #[ram(persistent)]. It's implemented for a minimal set of integers, floats, portable_atomic::Atomic*, and arrays that should cover most usage.
Looks good but I'm on vacation this week -not sure if and when I will get to this before next week. So, if anyone likes to review this: feel free