rfcs Unsafe fields

Co-authored by @jswrenn.

Fixes https://github.com/rust-lang/rfcs/issues/381

Tracking:

https://github.com/rust-lang/rust/issues/132922

Jul 13 '23 04:07 jhpratt

By introducing unsafe fields, Rust can improve the situation where a field that is otherwise safe is used as a safety invariant.

I think the motivation could point out who would benefit from this.

I assume it's library authors a making unsafe fields a "reminder to self" about upholding some invariant as opposed to say expecting a unsafe field in an API surface. I.e we still expect unsafe set_len rather than pub unsafe len, right?

Jul 13 '23 06:07 algesten

this seems closely related to mut(self) fields https://github.com/rust-lang/rfcs/pull/3323 which should probably be mentioned.

Jul 13 '23 07:07 programmerjake

I assume it's library authors a making unsafe fields a "reminder to self" about upholding some invariant as opposed to say expecting a unsafe field in an API surface. I.e we still expect unsafe set_len rather than pub unsafe len, right?

Not necessarily. It is entirely reasonable that Vec.len could be exposed. Whether it actually is exposed is a decision solely for T-libs-api. Likewise with the inner fields of the various nonzero types. I'd rather not tie down who this is intended for, as it truly is intended for everyone. I know I've written a binary that had fields relied upon in unsafe code — there was just no way to make it actually unsafe.

this seems closely related to mut(self) fields #3323 which should probably be mentioned.

Related, sure, but beyond the mention in unresolved questions, I'm not sure how it could be mentioned. Pretty much the only overlap is what's considered a "mutable access", which I didn't feel necessary to be copy-pasted.

Jul 13 '23 07:07 jhpratt

How does this look with functional struct update?

Foo {
// Stuff
.. unsafe { other }
}

? Or does the whole initializer need to be unsafe?

Edit: or I guess it doesn't need unsafe if the source had been initialized with unsafe.

Jul 13 '23 08:07 jsgf

@jsgf Great question. I don't have an immediate answer, though I believe the mechanism currently in the compiler would require the entire initializer to be unsafe.

Jul 13 '23 08:07 jhpratt

I love it! This is much better than getters and setters, both for library authors and users.

typed-builder would have to adjust, but I would love to implement this feature there :)

@idanarye (the main author) Maybe you have some input regarding the builder pattern?

Jul 13 '23 09:07 mo8it

The implicit notion that only mutation is unsafe (and reading is not) seems tricky. Do you have reasoning to prove that we'll never need fields that are unsafe to read? Maybe there should be an alternative syntax proposal (like unsafe(mut) or mut(unsafe)) that makes this more obvious/explicit?

Jul 13 '23 10:07 djc

@mo8it I don't want to spam the comments here with a discussion about typed builder, so I've opened a discussion in my repository instead: https://github.com/idanarye/rust-typed-builder/discussions/103

Jul 13 '23 13:07 idanarye

Regarding the RFC itself - I think you are trying to solve a visibility issue with the safety mechanism, which is the wrong tool for the job. You gave Vec as an example, and it looks like you want to grant public read access to its len field (so that you don't have to use len() as a method? Let's ignore the question if that's big enough an improvement to justify such a feature). To do so, you are willing to provide public write access to it as well but make that access unsafe.

But why?

I mean, it's obvious why you don't want to give regular write access to len. But why give any access at all? Even if you require an unsafe block, what good will come from letting external users modify the len field without going through a method that upholds the invariant? As long as we are devising a brand new feature, wouldn't it make more sense to add a feature that gives public read access without any write access at all (other than private access from inside the defining module, of course)?

I'm aware of the set_len method that grants such access, but this is an explicit decision to give such access, with a fully documented method. Not a side-effect of wanting to provide a non-method-call read access to the field.

Another thing - conceptually it never makes sense to define only one field as unsafe. The invariant is a property of the struct as a whole. If this is unsafe:

let mut vec = Vec::new();
vec.len = 4; // UNSAFE!!!

The why not this?

let mut vec = Vec::from([1, 2, 3, 4]);
vec.buf = RawVec::new(); // perfectly safe apparently

Yes, buf is not publicly exposed at all, but inside the module len will need unsafe block to modify and buf won't, even though the invariant is about both of them, together, and how they interact with each other.

Whatever the semantics of unsafe fields will be - conceptually it makes more sense to put the unsafe on the struct itself. If there are fields that are not part of the invariant, they should not be part of the struct - because unsafety should be contained as much as possible and not contaminated with unrelated data. The only reason putting unsafe only on len seems to make case in your example is that the goal - as I've said before - is about visibility, not about safety.

Jul 13 '23 14:07 idanarye

Vec::len should not do this even if this feature exists, because Vec::set_len is better pedagogically.

static muts require unsafe blocks for both writing and reading. An unsafe field would likely be some similar construction, so unsafe for both writing and reading. An UnsafeCell already hits those requirements, but any variants should set their auto-traits.

Ain't clear this proposal handles auto-traits correctly, even if some use case exists. If you need this, then define your own type which provides this. We've the inner-builder or whatever deref polymorphism pattern, which comes up far more in practice, and can simulate this of desired.

pub struct ThingBuilder { ... }

impl ThingBuilder {
    fn build(self) -> Thing {
        ...
        Thing { ..., inner, self }
    }
}

pub struct Thing {
    ... 
    inner: ThingBuilder,
}

impl Deref for Thing {
    type Target = ThingBuilder;
    fn deref(&self) -> ThingBuilder {
        &self.inner
    }
}

// We stop mutating ThingBuilder once we create a Thing, so Thing: !DerefMut,
// but Thing: Deref<Target=ThingBuilder> to make reading & replicating the
// builder config easy.

Jul 13 '23 23:07 burdges

Do you have reasoning to prove that we'll never need fields that are unsafe to read?

How could a field of a struct be unsafe to read?

You gave Vec as an example, and it looks like you want to grant public read access to its len field (so that you don't have to use len() as a method?

Within the module it's defined in (as the field is private), it is currently safe to assign any value, despite the fact that it can lead to undefined behavior. Said another way, the current behavior is inherently unsound.

Nothing in the RFC so much as hints at Vec.len being made public, nor is an RFC an appropriate place to make a change like that. It is an example of a field that should be unsafe to avoid unsoundness and nothing more.

Another thing - conceptually it never makes sense to define only one field as unsafe.

I never claimed that was the case.

The why not this?

let mut vec = Vec::from([1, 2, 3, 4]);
vec.buf = RawVec::new(); // perfectly safe apparently

Inclusion of one example does not mean that everything not included is forbidden. There is simply no reason to repeat the same thing for every field. Of course buf would also be unsafe. I used Vec.len as the example because it's a clear, obvious example where its safety invariants are publicly documented.

The invariant is a property of the struct as a whole.

If there are fields that are not part of the invariant, they should not be part of the struct

For Vec, yes, but only because all fields of the strict interact with all other fields.

I have real world code where this is not the case. The flags field is for whether other fields are initialized. Note that some fields have niche value optimization, and as such don't interact with other fields in any way. Are you asserting that month: Option<Month> and other similar fields should be in a separate struct solely because it has niche value optimization? That appears to a logical conclusion as a result of what you've said.

static muts require unsafe blocks for both writing and reading. An unsafe field would likely be some similar construction, so unsafe for both writing and reading.

static mut requires unsafe due to inherent data races between threads. Unsafe fields have no such issue.

Ain't clear this proposal handles auto-traits correctly, even if some use case exists.

I don't follow. What problems do you see with handling auto traits? The type of the field is unchanged, so there would be no impact on auto traits.

Jul 14 '23 00:07 jhpratt

(@jhpratt I'd love for you to steal something like the text in this post for the RFC)

I assume it's library authors a making unsafe fields a "reminder to self" about upholding some invariant as opposed to say expecting a unsafe field in an API surface.

A huge incentive for it, to me at least, is helping avoid misunderstandings about the model.

For example, a 2020 PACMPL paper contains the following statement:

To check how prevalent unsafe is used as a documentation feature, our queries gathered data for unsafe traits and unsafe functions with safe implementations.

set_len being an unsafe fn is of course not merely a "documentation feature". It's an absolutely critical part of the soundness of Vec. (And the paper does talk about "invariants that are potentially critical for upholding Rust’s safety guarantees" (emphasis added), so I don't think their analysis is incorrect, but I still find the phrasing curious.)

Thus a huge win of this would be to avoid the (from the same paper)

After all, the compiler does not force developers to declare such functions as unsafe -- in contrast to other unsafe features.

Having the body of set_len do something the compiler recognizes as unsafe is a big help to people understanding the soundness of Vec, and in general to people adding new features inside existing unsafety privacy boundaries.

Especially combined with other accepted work like https://rust-lang.github.io/rfcs/2316-safe-unsafe-trait-methods.html we could start even doing things like clippy lints for "why is this unsafe when it doesn't do anything unsafe? Should one of the types involved be marked unsafe?"

If we don't have to link to tootsie pops as often because changing things that are relied on by unsafe code is itself unsafe, I'd consider that a big win.

As another way to look at this, it's weird that when I'm writing a method on by type with a safety invariant that I can do Self { a, b } and it's totally "safe", whereas if I call Self::new_unchecked(a, b) I need an unsafe block and tidy nags me to write a safety comment.

Tidy should nag about a safety comment for the constructor too, so that I'm not disincentivized to use the other, correctly-marked-unsafe function when writing things.

Jul 14 '23 02:07 scottmcm

We do not need unsafe per se when maintaining a safety invariant within its defining visibility boundary aka module:

"Because it relies on invariants of a struct field, this unsafe code does more than pollute a whole function: it pollutes a whole module. Generally, the only bullet-proof way to limit the scope of unsafe code is at the module boundary with privacy."

In other words, rust does not have unsafe types because you must enforce invariants at module boundaries anyways. Also various discussions in https://github.com/rust-lang/unsafe-code-guidelines clarify this point.

A method like Vec::set_len must be unsafe due to being public. An extern fn is unsafe because it points outside the module. etc.

Anyways..

If I understand, you want this type:

pub UnsafeInvariant<T>(T);
impl<T> UnsafeInvariant<T> {
    fn new(t: T) -> UnsafeInvariant<T> { UnsafeInvariant(t) }
    unsafe fn get(&self) -> &T { &self.0 }
    unsafe fn get_mut(&mut self) -> &mut T { &mut self.0 }
}

It's similar to UnsafeCell but propagates all auto-traits normally, based upon your comment above.

You still enforce invariants by visibility but UnsafeInvariant could provide the documentation for which you propose unsafe fields. In practice, I suspect you'd be better off like this:

    fn invariant_get(&self) -> &T { &self.0 }
    fn invariant_get_mut(&mut self) -> &mut T { &mut self.0 }

Why? All those unsafe blocks you'll write risk other mistakes, so ideally they should not exist if they merely maintain some invarant. Instead, you want a safe but distinctively named accessor method, which flags that you maintain the invariant.

Anecdotally, this type winds up being much more common:

struct HideMut<T>(T);
impl<T> Deref for HideMut<T> {
    type Target = T;
    fn deref(&self) -> &T { &self.0 }
}
impl<T> HideMut<T> {
    fn new(t: T) -> HideMut<T> { HideMut(t) }
    unsafe fn get_mut(&mut self) -> &mut T { &mut self.0 }
}

And UnsafeCell remains more common than both of course.

In fact, if you wrap the HideMut declaration inside some macro_rules! use_hide_mut then HideMut becomes module local, so the local module can access pub foo: HideMut<Foo> fields freely, but the outside world has only immutable access, even if given a &mut for the containing struct. This is really the common pattern.

We made this a local type for visibility modifiers, so a language level construct helps here. Also conversely, if you do not require visibility modifiers then simple types like UnsafeInvariant suffice, no language change necessary.

I suppose one might imagine pub(positive_visibility) unsafe(negative_visibility) mut(positive_visibility) field: type, except this still cannot capture when mutation becomes unsafe but reading remains safe. Yet, visibility control types like UnsafeInvariant and HideMut work fine.

Jul 14 '23 02:07 burdges

EDIT: see below; it looks like the thing I was worried about here is probably impossible for other reasons.

I do thing that "safe to read; unsafe to modify" is the 99%+ case, and should certainly be the default, but

How could a field of a struct be unsafe to read?

Well, the field in AtomicPtr is unsafe to read, because it could be a race, for example. https://github.com/rust-lang/rust/blob/7a5814f922f85370e773f2001886b8f57002811c/library/core/src/sync/atomic.rs#L176

Or the value field of a ShardedLock in crossbeam https://docs.rs/crossbeam-utils/0.8.11/src/crossbeam_utils/sync/sharded_lock.rs.html#81

So perhaps some nuance for !Freeze could make sense? I'm not sure what the semver implications of that would be, though.

Jul 14 '23 03:07 scottmcm

Well, the field in AtomicPtr is unsafe to read, because it could be a race

I don't think that is correct? The unsafe operation is dereferencing the pointer returned by UnsafeCell::get(), accessing the field can't lead to UB on its own.

Jul 14 '23 03:07 Jules-Bertholet

accessing the field can't lead to UB on its own

Ah, I guess an access can't actually read an UnsafeCell (without ownership) because it's never Copy.

So I think the PlaceMention is always ok for everything, and a read would be unsafe for an UnsafeCell, but you can't actually do a read of an UnsafeCell directly in Rust.

Jul 14 '23 03:07 scottmcm

We do not need unsafe per se when maintaining a safety invariant within its defining visibility boundary aka module

rust does not have unsafe types because you must enforce invariants at module boundaries anyways

The nomicon describes current behavior; using it as an argument against this RFC is counter to the purpose of the RFC. It's circular reasoning at best.

if they merely maintain some invarant.

The invariants are "merely" there for soundness. If the invariant is violated, the result is undefined behavior. That's far more serious than you make it sound.

Yet, visibility control types like UnsafeInvariant and HideMut work fine.

I have never seen anyone write code like this in practice. The standard library and my own code (in time) is included in this. That is a significant argument in favor of something better.

@scottmcm I'll definitely include parts of that into the RFC. Also reading that blog post now — I'd never seen it before.

Jul 14 '23 07:07 jhpratt

If the invariant is violated, the result is undefined behavior.

That is a significant argument in favor of something better.

This RFC is not better because memory safety also helps when writing unsafe code.

The unsafe keyword is not simply a marker to tell you where to read more carefully. Its a marker of where safety rules must be violated.

In other words, we always convert regular code invariants into memory safety assurance, but these regular code invariants have exactly the same risks as other regular code, including their own memory safety concerns. This RFC confuses the memory safety consumed in maintaining the regular code invariant with the actually unsafe options the code requires.

In the past, unsafe fn bodies were unsafe blocks, but rust changed this to reduce the unsafe code surface area. This RFC is a mistake because it increases the unsafe code surface area with no benefits, given the same cautions can be maintained in other ways, like by variable naming, etc.

UnsafeInvariant not being used is evidence this feature is not required. UnsafeInvariant would make sense if you wanted to split the regular code invariant across distant visibility boundaries. In practice, unsafe fns always sufficed, or indeed proved more nuanced than UnsafeInvariant.

Anyways..

I think this discussion belongs in https://github.com/rust-lang/unsafe-code-guidelines where at least some people think formally about the unsafe code boundary.

Jul 14 '23 09:07 burdges

In other words, we always convert regular code invariants into memory safety assurance, but these regular code invariants have exactly the same risks as other regular code, including their own memory safety concerns. This RFC confuses the memory safety consumed in maintaining the regular code invariant with the actually unsafe options the code requires.

This is your fundamental misunderstanding.

Other code in Vec relies on the invariants of Vec.len in ways that leads to undefined behavior if the invariants are broken. Fields like Vec.len are not "regular code invariants" — they are tightly coupled to whether the code is sound or not. It is a soundness invariant. You cannot possibly claim otherwise.

given the same cautions can be maintained in other ways, like by variable naming, etc.

Frankly, it's thinking like this that led to the creation of Rust. Thread safety can be maintained if everyone is super careful, but we all know how that works out. Likewise with a million other things. Programmers can not be relied upon to do the right thing. We have to force them to do it by leveraging the compiler wherever possible.

I think this discussion belongs in https://github.com/rust-lang/unsafe-code-guidelines where at least some people think formally about the unsafe code boundary.

I have no idea why you think the discussion belongs there, particularly as you're the one that initiated it here.

Jul 14 '23 10:07 jhpratt

Other code in Vec relies on the invariants of Vec.len in ways that leads to undefined behavior if the invariants are broken.

Yes, but this does not make altering Vec.len within the Vec module an unsafe operation. That's not how safety works.

What happens if I've unsafe code which relies upon an invariant between the values in a slice, so your unsafe field is a &[*mut Foo] or &[usize]? I want memory safety around the regular code invariants in how I maintain this slice. Yes, those regular code invariants control memory safety in how the module get used, hence privacy. It's clearly worse if I've many more unsafe blocks merely to access this slice, which now might intermix with some real unsafe code blocks for the *mut Foos or even violate slice invariants.

We often have this "catch your tail" phenomenon in formalalisms.

Frankly, it's thinking like this that led to the creation of Rust

No. There is a formal model from the rustbelt project about how unsafe works, which gets discussed in the unsafe code guidlines repo. We should only expand what falls under unsafe code if the formal model says so. This RFC confuses those really important formal models with mere "read this" markers.

In brief, you don't have a formal conception of when unsafe types should be used. It's unlikely one exists. That makes this change a regression towards the C days.

Jul 14 '23 10:07 burdges

Yes, but this does not make altering Vec.len within the Vec module an unsafe operation. That's not how safety works.

It actually is. If something can result in undefined behavior, it is unsafe. Where that happens is wholly irrelevant. You're using circular logic regarding the UCG, which is by definition written about Rust as it currently is. You seem to be claiming that library undefined behavior doesn't exist, which is bizarre.

I'll leave it at that as there's clearly no convincing you that the mere concept of an unsafe field has merit. I won't be responding further to anything along those lines.

Jul 14 '23 11:07 jhpratt

Could you add another motivating example to the RFC? It seems like the vec.len example isn't super convincing because changing anything in Vec/RawVec breaks the model down and is "unsafe", by definition: that is protected via the private/public module interface. This makes me think that the resolution would be something like unsafe modules or #[unsafe] types rather than fine grained per-field control, so I'm curious to see what motivates this specific solution.

Jul 14 '23 19:07 tgross35

@tgross35 I linked this earlier in the thread. Is that sufficient for you? It demonstrates that field-level safety is appropriate, as some of the fields are perfectly safe to set as they have no interaction with any other field. Yet at the same time, all fields are one logical unit that should not be split into separate structs.

Jul 15 '23 03:07 jhpratt

Thank you for the link, I meant specifically to add an example to the RFC document (which perhaps you planned to do anyway).

Even with that example, it does seem to me that it would be more correct to nest flags into a separate type with the other items it has the invariant with. To me, it illustrates the concept better; a field alone is never unsafe but rather it is unsafe in the context of other fields, and it seems like this is a suitable level for abstraction to a separate type. Also:

If a struct of 12 fields has 8 marked unsafe, it isn't immediately clear how they may be related. Are they all tied together via an invariant? Are 6 tied together and the remaining 2 associated somehow? This becomes immediately clear by separating things that are related into separate types, and I think that behavior should be encouraged
For the example Parsed struct, if sufficiently many fields are related to flags then it seems like it would make sense to mark the entire struct #[upholds_unsafe_invariant] or something like that. Maybe a few fields aren't related to the invariant, but commenting // SAFETY: not related to the invariant is easy enough to make this clear (and forces you to make the change if they do become related to the invariant for some reason)

Niche optimization was mentioned as well as a reason for not wanting to split off types, but I think this is more applicable to all separations of logic, and not just those with safety concerns. I don't wish to derail this conversation, but the thought has crossed my mind before about how Rust could potentially use a #[flatten] attribute for nested struct fields that tells the compiler to merge the child struct's fields into the parent's and rearrange for best possible size. (yes - possible method duplication, but this would be meant for one-off structs).

I think that investigating something like that may be more widely useful than using niche optimization as a justification for why a single flat struct is the correct solution. (edit: I brought this up for some discussion on Zulip https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/.60.23.5Bflatten.5D.60.20struct.20field.20attribute)

Jul 15 '23 19:07 tgross35

Even with that example, it does seem to me that it would be more correct to nest flags into a separate type with the other items it has the invariant with.

If you force people to restructure their code in order to use unsafe annotations, many just won't bother. Let's not make the perfect the enemy of the good!

Jul 15 '23 20:07 Jules-Bertholet

Thank you for the link, I meant specifically to add an example to the RFC document (which perhaps you planned to do anyway).

Of course. I was asking more generally, as it's easier to discuss things than to update an RFC :slightly_smiling_face:

it isn't immediately clear how they may be related.

That's what documentation is for. The safety invariants have to be (or rather it's strongly encouraged to be) documented anyway.

Niche optimization was mentioned as well as a reason for not wanting to split off types

I think you misunderstood what I was trying to convey. The reason that some fields are MaybeUninit while others are Option is because the Option ones already have niche value optimization, while the ones using MaybeUninit do not. For optimization purposes, I bitpack those flags into a single field. All fields are still necessarily related, and splitting it into multiple structs (even if flattened in memory) would significantly reduce the ergonomics of the type.

Maybe a few fields aren't related to the invariant, but commenting // SAFETY: not related to the invariant is easy enough to make this clear

While true and certainly possible, it's far from ideal, as you're now requiring someone to use unsafe when nothing unsafe is actually happening.

Jul 16 '23 02:07 jhpratt

I notice that in both examples, the field marked as unsafe (len/flags) is perfectly safe by itself, but used to store information about the actually unsafe fields (RawVec/MaybeUninit). These other fields don't need to marked as unsafe because their type is already unsafe - or, more precisely, since types themselves cannot be marked as unsafe - their types' data is not public and methods they expose to access it are unsafe.

So, if such annotation for these fields is needed (and I'm not saying it is), maybe it should express this relationship explicitly? Something like:

pub struct Vec<T> { // ignoring the allocator for brevity
    #[safety_invariant("len")]
    buf: RawVec<T>,
    len: usize,
}

Not sure how the access syntax for these fields should look like, though.

Jul 16 '23 13:07 idanarye

IMO the annotation on len makes a lot more sense. The type usize for len is a plain lie: this cannot just be an arbitrary integer! If we had refinement types, we would write len: usize @ length_of(buf), or so, but we don't, so we propose to write unsafe len: usize instead to clearly document that usize here is incomplete information.

Jul 16 '23 13:07 RalfJung

@RalfJung What does length_of(buf) even mean? This value is not known at compile time, and may even change at runtime after setting len.

Jul 16 '23 15:07 idanarye

Yes, refinement types don't have to be compile-time constants. It is meant to be pseudo-notation to refer to the length of the initialized part of the buffer.

The point is that len is not a perfectly safe field. It has a subtle invariant. In this particular case the invariant is relational (it also involves buf), but that shouldn't distract from len being unsafe. (Of course buf should also be unsafe.)

Jul 16 '23 15:07 RalfJung

rfcs rfcs copied to clipboard

Unsafe fields

rfcs
rfcs copied to clipboard