wg icon indicating copy to clipboard operation
wg copied to clipboard

deal with rustc/LLVM bug around LLVM's `dereferenceable` and volatile ops

Open japaric opened this issue 6 years ago • 21 comments

The current implementation of the peripheral API generated by svd2rust is affected by bug rust-lang/rust#55005. The (hypothetical) observable effect of this bug on the svd2rust API is that LLVM may insert spurious reads to MMIO registers in optimized builds. These spurious reads in turn may cause some bits of the register to be cleared (e.g. interrupt flags) or flipped, changing the intended behavior of the program.

The way to prevent this rustc/LLVM bug is to never create a reference (&- or &mut-) to a MMIO register, or in general to memory that is meant to be accessed only through volatile operations. This document explores alternative implementations of the svd2rust API to avoid this bug.

Rendered

japaric avatar Oct 23 '19 17:10 japaric

Can/should we ask people in charge from the core teams to give estimates for fixing this? Giving them some heads up about the impact for the bare-metal/embedded space upfront to consider when having a second look at the bug, which maybe helps to bump the relative priority for fixing it?

andre-richter avatar Oct 23 '19 18:10 andre-richter

I would say that, in general, you're using both potential paths wrong.

  • The ZST form definitely seem insufficiently abstracted and safe-ified.
  • The VolAddress form is definitely not how I expected people to use the crate so you end up with non-ZST values flying around so of course it's going to be not zero cost because you're actually pushing around dynamic addresses everywhere (seems an unfair comparison).

You need to mix the two approaches.

Lokathor avatar Oct 23 '19 18:10 Lokathor

Hm, if we have used the "generic" approach before, then certainly not in a lot of places, most HALs multiply the code via macro plus that also depends on the laziness/correctness of the SVD author. I'm not actually too worried about additional in svd2rust to generate the additional implementations.

I can also see option 3) Wait for const generics to be stabilized, then we can use the addresses to instantiate the peripherals.

therealprof avatar Oct 23 '19 19:10 therealprof

It was asked that I put my more detailed explanation of what I think is wrong with the example, so here goes:

  • In the "ZST" version you had your target addresses effectively be associated constants of the types by writing in a literal value in the methods.
  • In the "voladdress" version you actually instanciated some VolAddress values at runtime and passed them around.

Instead, what you should do is also make the VolAddress values be associated const values of the types that they're connected to. Unless the value is truly not known until runtime (eg: the address of the Nth palette value, where N is a runtime value) then you should declare the const and use that.

Here is an example of me doing that sort of thing. In my case, I'm declaring it as just a pub const but you can of course declare it as a private associated const of the ZST value that controls access.

I hope this helps!

Lokathor avatar Oct 23 '19 19:10 Lokathor

It seems that this is getting traction now At the root of the problem as can be seen by Niko's comments in the issue he referenced (thanks @jamesmunns for highlighting). I would vote for holding off a bit from fixing this ourselves now with workarounds and closely track what happens there.

andre-richter avatar Oct 23 '19 21:10 andre-richter

I forgot to mention in the RFC / PR description but the "ZST approach" let us move registers out of peripherals -- this feature was requested in rust-embedded/svd2rust#213 -- so we may want to implement that approach or something like regardless of what happens with the rustc/LLVM bug.

japaric avatar Oct 23 '19 23:10 japaric

CC @bradjc @ppannuto

Maybe worth checking if tock-registers has this problem too. Currently rather busy and haven't found time yet to take a closer look myself.

andre-richter avatar Oct 24 '19 07:10 andre-richter

So wait a second. I want to make sure I understand what is motivating this RFC. I believe the problem is specifically that get itself takes an &UnsafeCell<T>, and that is a shared reference, which is presently marked as dereferenceable. Is this causing problems in practice?

It seems to me that it is a touch premature to conclude that VolatileCell is unsound, as the language rules here remain in flux. Put a bit more strongly, I feel like the language should support some means of doing this -- that might be by changing the rules around &UnsafeCell<T> to remove the deferefenceable attribute (or something like that), or it might be by making VolatileCell more of a primtive, I'm not sure what. But this seems like a concept that ought to be expressable.

This has obviously been discussed numerous times before. rust-lang/unsafe-code-guidelines#33 has quite a lot of material. A quick skim turned up that @RalfJung previously suggested removing the deferenceable attribute for references to UnsafeCell, but I can't tell if that idea itself has been dismissed.

In short, I suppose, I don't claim to be caught up on the discussion here, but I think that before making drastic changes it would be good to prioritize this aspect of the discussion for deeper lang team discussion.

nikomatsakis avatar Oct 31 '19 00:10 nikomatsakis

I think the point of making this sort of a change would be that the lang team doesn't have to be consulted at all because it doesn't ask for any language changes. We just change a few libraries and we're back to soundness.

Lokathor avatar Oct 31 '19 01:10 Lokathor

(Meta comment: I am not sure if calling this a "rustc bug" is appropriate; if anything this is a "Rust language bug", the compiler implements the language as intended. And I think more accurately this is a "Rust language feature request"; references to MMIO memory are not supported right now and supporting them is a feature we'd like to have. Certainly there is no "LLVM bug" anywhere in sight here.)

It seems to me that it is a touch premature to conclude that VolatileCell is unsound, as the language rules here remain in flux.

It is unsound under the current conservative rules. Not sure what else "unsound" could mean. Sure, it could be made sound by language changes, but that is true for many cases of unsoundness we are facing. ;)

But this seems like a concept that ought to be expressable.

Absolutely! But it isn't right now, so there are two things we can explore in parallel: (a) how to make it expressible; this involves language changes. (b) how to write an ergonomic MMIO library under the current conservative rules; this can be explored by libraries without lang team involvement.

RalfJung avatar Nov 03 '19 17:11 RalfJung

Why would it be expressible? That part genuinely doesn't make much sense to me.

If you have a reference to an MMIO location, then within rust's type system that logically means that you can also own an MMIO location by de-referencing it, which is pretty nonsense because you never own the MMIO location.

Lokathor avatar Nov 03 '19 17:11 Lokathor

You don't own memory behind shared references either, so that doesn't seem like an issue to me.

RalfJung avatar Nov 03 '19 17:11 RalfJung

I'm more talking about mental modeling of the types, not literal ownership.

But if I'm the only one who thinks it's weird then whatever.

Lokathor avatar Nov 03 '19 17:11 Lokathor

The situation is very similar for concurrent data structures like crossbeam's channels, where the conceptual idea of "ownership" also has to expanded to properly explain what happens. But shared references can handle the intricate interactions of multiple threads cooperatively implementing a channel; I do not see why they couldn't also model MMIO interactions.

RalfJung avatar Nov 03 '19 17:11 RalfJung

Hi everyone,

I wanted to post an update from the @rust-lang/lang side of things. On 2020-02-10, we held a "design meeting" where we talked over the question of the dereferenceable attribute that is attached to references in detail. You can see the minutes and there is also a recording available if you'd like to watch. The meeting was not, I don't think, conclusive -- I'm hoping to find some time to try and summarize the key conclusions, in part to understand them better myself!

Still, there are a few things I can say.

Going into the meeting, I hoped that we would find that it seemed "obviously correct" that &UnsafeCell references (that is, a reference value of type &T where T contains an UnsafeCell) ought not to be considered "dereferenceable". Unfortunately, I came away feeling the opposite. In short, the situation is complex, and while making &UnsafeCell non-dereferenceable would address some of the complexity, it doesn't handle make all the patterns work that one might expect.

Moreover, it seemed like there may be just a categorical difference between MMIO and the other use cases that motivate making &UnsafeCell non-dereferenceable. The other use cases are due to potential UB in the unsafe code that implements Rust types like Arc and RefCell. In these cases, the problem is that you have &T references which begin as dereferenceable but which may transition to become invalid (sometimes as a result of the actions of other threads, as with Arc).

In LLVM terms, these might be addressed by upcoming changes that change dereferenceable to mean "dereferenceable on function entry". We could potentially adapt stacked borrows in a similar way, though this would mean foregoing some of the optimizations we had hoped for, and it wouldn't solve all the cases we might wish (see "What does this help, and what doesn’t it help?" in the notes).

The issue with MMIO however is different -- in this case, you want something like &VolatileCell which should never be dereferenceable. That is, the contents of the VolatileCell should never be accessed without explicit action. Moving to something like "derefernceable on start" would not help here.

So, in conclusion, we were weighing two options. Neither of them is wholly satisfying, and only one of them helps with MMIO:

  • Make &UnsafeCell not dereferenceable
    • More complex, type-dependent behavior that we have to fully work through
    • Addresses MMIO and some of the reference-counting-like cases, but not all
      • e.g., doesn't help with RefMut nor with cases where the reference count is not embedded in the & reference
  • Move to "derefenceable on start" semantics
    • Simple, uniform behavior
    • Addresses all the reference-counting-like cases but not MMIO
    • Sacrifices optimizations

It's hard to say which of these is better, and I guess I would say that we'll probably spend some time looking for a "third way". But this does mean that we can't say for sure whether whatever solution we find would help with MMIO.

So what does that mean for this RFC? In the meeting, we discussed that the right way to model volatile memory might be to use raw pointers and not references. So instead of passing around &RegisterBlock, one might use *const RegisterBlock. However, in practice, this can be rather annoying today, owing to various small ergonomic hurdles involved with *const pointers -- for example, the lack of fn foo(*const self) methods. We've been contemplating for some time trying to take a fresh look at raw pointer ergonomics, and that might help here (you'll see some early thoughts in the notes of introducing something like &unsafe, but I don't want to go into it here).

Either way, it seemed like if you had a *const RegisterBlock, you might wind up encapsulating that pointer into a newtype'd reference, something like

#[derive(Copy)]
struct VolatileRef<T> {
    pointer: *const T
}

impl<T> VolatileRef<T> {
    unsafe fn new(ptr: *const T) { ... }

    fn get(self) -> T {
        unsafe { ... }
    }
}

so that you can encapsulate the unsafety into the new method (here I'm presuming that these addresses don't require a lifetime associated with them).

Anyway, I don't want to try and design an ergonomic API in this comment, I'm merely pointing out this as a possible direction to consider.

I guess the overall conclusion would be:

  • Removing or weakening dereferenceable from &T may make sense but it's complex and I think better not to count on it, particularly as some of the proposals don't help with MMIO cases
  • Raw pointers feel like a good fit for a "reference that should not be lightly dereferenced"
  • Maybe they can be encapsulated into an ergonomic API? If not, it'd be good to know why not, since maybe that is something we ought to address

nikomatsakis avatar Feb 19 '20 10:02 nikomatsakis

@nikomatsakis This does not quite belong to this thread, but I haven't found another place to discuss those meeting notes. Please point me to one, if any.

I think there could be a use case for non-dereferenceable references that have lifetimes and obey borrow checking rules in non-MMIO use cases. Such a primitive would seem like a good fit for other use cases of volatile besides MMIO (optimization barrier, shared-memory IPC, virtual memory, cryptography...), where the target buffer may have a finite lifetime, while still being able to address the MMIO use case by giving MMIO registers a 'static lifetime (or whatever else is appropriate for the hardware at hand).

HadrienG2 avatar Feb 19 '20 17:02 HadrienG2

I would suggest a struct with a raw pointer, len, and a PhantomData for lifetime.

Lokathor avatar Feb 19 '20 18:02 Lokathor

@HadrienG2 thanks (and you're right that there's no obvious place to collect feedback on the notes). I agree with @Lokathor that a struct could be used in that case to "emulate" a reference, but I can see why it might be nice to have some more native way to model it as well.

nikomatsakis avatar Feb 25 '20 19:02 nikomatsakis

A few thoughts about this issue/RFC:

  • min const generics have stabilised, so it may make sense to revisit this
  • const generics should allow writing generic functions with an easier implementation than what is proposed in the RFC
  • we could have ZST types with an easy way to convert them to a struct with a raw pointer, to keep advantages of the VolAdress approach when useful.

By the way, I saw that redox uses MaybeUninit for MMIO. Would it solve some problems to use that with or instead of UnsafeCell ?

https://github.com/redox-os/syscall/blob/f07a954825344de2ab4d15d394525ac69a22dcea/src/io/mmio.rs#L7-L10

kellda avatar Apr 07 '21 09:04 kellda

Regarding volatile and dereferencable, the recent lang team discussion for dereferencable is also relevant. volatile was not the primary topic of the discussion, but fixing dereferencable for Arc (https://github.com/rust-lang/rust/issues/55005) could, as a side-effect, also help with volatile. Also see this Zulip discussion.

By the way, I saw that redox uses MaybeUninit for MMIO. Would it solve some problems to use that with or instead of UnsafeCell ?

No, MaybeUninit has no effect on dereferencability of pointers. It also cannot replace UnsafeCell (but the two can be combined).

RalfJung avatar Apr 07 '21 11:04 RalfJung

Looking at the redox line in isolation, any with no other knowledge of their project, it's just plain wrong.

unless by "Mmio<T>" they for some reason mean something else entirely.

Lokathor avatar Apr 07 '21 12:04 Lokathor

rust-lang/rust/#55005 seems to be fixed now, so I think this can be closed.

Sympatron avatar Aug 27 '24 07:08 Sympatron

It is still UB to hold a reference to MMIO memory. The compiler is still permitted to add spurious loads.

RalfJung avatar Aug 27 '24 07:08 RalfJung

This is not a rustc/LLVM bug though, it is working as intended. References to MMIO memory are simply not supported by Rust right now, and until someone designs and implements support for that, you must use raw pointers.

RalfJung avatar Aug 27 '24 07:08 RalfJung

I don't think we should say it's actually UB, only that spurious loads can happen.

In a vast majority of cases a spurious load of an mmio location is inefficient in terms of cpu time, but it is harmless in terms of program correctness. calling it UB seems overkill.

Lokathor avatar Aug 27 '24 13:08 Lokathor

Well, in general if this memory is inside a Rust allocation (which it has to be to have a reference to it), spurious loads and spurious stores can happen. Spurious stores are much harder to justify for the compiler, but it is also hard to make a guarantee that would rule them out. That would still require some AM design work.

RalfJung avatar Aug 27 '24 14:08 RalfJung

Well it's the "it has to be to have a reference to it" part that isn't great. Because references mean so many things in rust and one of those things is that you can use methods. If you have just a pointer instead of a reference you basically can't do methods, and that's deeply annoying from an ergonomics perspective.

Lokathor avatar Aug 27 '24 15:08 Lokathor

I never said that the current state is great. I am just explaining what the current state is.

OTOH references have "reference" in the name so requiring them to be "dereferenceable" is not exactly far-fetched. So maybe the better solution is allowing methods on raw pointers. But anyway that is all off-topic here.

RalfJung avatar Aug 27 '24 15:08 RalfJung

but it is harmless in terms of program correctness. calling it UB seems overkill.

This is not true. It is not uncommon for MMIO register reads to have side effects like clearing flags. This is why this is so problematic.

For my peace of mind: I discovered this issue today and if I understand it correctly all current PACs have the potential of UB because of this. This seems quite serious. Is this "accepted" by the embedded community, because nobody ever saw it in the wild so the issue is more a theoretical one?

Sympatron avatar Aug 27 '24 20:08 Sympatron

This is not true. It is not uncommon for MMIO register reads to have side effects like clearing flags. This is why this is so problematic.

I am aware it's possible, because I have read the initial post of this issue, and also I have also written a crate for an embedded device. The first half of my sentence you quoted is critical context for the second half of that same sentence.

Is this "accepted" by the embedded community, because nobody ever saw it in the wild so the issue is more a theoretical one?

there's various crates to help ease the API around mmio stuff (volatile, voladdress, probably more), but it's not perfect and we could benefit from some language changes I'm sure. The exact details would need to be designed and probably a lot of people would have to be convinced. Straw polls and smell tests of changes in this area are usually met with "do we really need to change the language for just that?" and similar.

Lokathor avatar Aug 27 '24 21:08 Lokathor