rfcs RFC: Alignment niches for references types.

This RFC proposes to add niches to reference-like types by exploiting unused bit patterns caused by alignment requirements.

Rendered

Previous discussion on IRLO

Dec 06 '21 21:12 moulins

This RFC does not explain how references to the reference would be handled. That is, the following code must be well formed:

pub fn banana(val: &mut E) -> Option<&mut &u16> {
    match val {
        E::A(ref mut val) => Some(val),
        _ => None
    }
}

let reference: &mut &u16 = banana(enum).unwrap();
static SOME_REF: u16 = 42;
*reference = &SOME_REF; // How does the compiler know it needs to copy over / remove the discriminant from alignment bits?

Right now the value returned by banana violates the validity invariants for the reference types and I don't see how the alignment niche bits could be masked out anywhere, as the knowledge of the reference being part of an enum is lost.

Dec 06 '21 21:12 nagisa

Right now the value returned by banana violates the validity invariants for the reference types and I don't see how the alignment niche bits could be masked out anywhere, as the knowledge of the reference being part of an enum is lost.

Under my proposal, either we have a E::A and the value inside is a valid reference, correctly aligned, or we don't have a reference at all, so there's no problem.

It works exactly the same way as the null pointer optimization (take your example and replace &mut E by &mut Option<&u16>).

Dec 06 '21 21:12 moulins

one possible way to spell Aligned<T> is &'unsafe T: see #3199

Dec 07 '21 18:12 programmerjake

one possible way to spell Aligned<T> is &'unsafe T

That could actually be pretty elegant; it would remove the need for defining the new niches via custom attributes.

Dec 07 '21 18:12 moulins

With respect to Aligned<_>: I remember people some time ago floating the idea of adding yet more pointer niches by recognising the fact that under some ABIs, certain pointer ranges are reserved and cannot possibly be allocated addresses (see e.g. #2400). If work on this feature were to resume, we’d have to invent yet another wrapper to express yet another pointer invariant.

So how about having a single std::ptr::WellFormed<_> wrapper that captures all possible pointer well-formedness invariants at once? Meaning, std::ptr::WellFormed<T> wraps a value of a memory address at which a T might possibly be allocated (though it need not be actually possible to dereference).

Dec 08 '21 17:12 fstirlitz

Once you set whatever "well formed" means as Stable you can't add more requirements later or you potentially invalidate old code. Even new platforms should likely not add new requirements or the old code will not actually be portable to those platforms and it will silently break. You could gate the effects of WellFormed by edition, and then rustfix would know that it can't touch that code and just give an error that it needs to be adjusted manually to re-check invariants.

Other than that it's a fine plan.

Dec 08 '21 17:12 Lokathor

Good point. But it also applies to modifying the ABI of regular references, which this proposal does without much restraint.

In practice I wouldn’t expect WellFormed<_> to be much different from &_ in terms of ABI, other than the compiler not being allowed to introduce spurious accesses to the former (via loop-invariant code motion and such). I guess this may make it justified to spell the feature as &'unsafe _ outright.

Dec 08 '21 17:12 fstirlitz

I remember people some time ago floating the idea of adding yet more pointer niches by recognising the fact that under some ABIs, certain pointer ranges are reserved and cannot possibly be allocated addresses

This isn't possible without a breaking change; for example, the following code is currently sound on all targets, independently of any reserved pointer ranges:

let ptr = 0xFEDCBA9876543210 as usize as *const ();
let zst: &() = unsafe { &* ptr };

This fact is also required for Vec soundness: an empty Vec points to a zero-sized slice constructed from such a dangling pointer.

It may be enough to special-case references to ZSTs to not have these requirements, but this seems pretty ad-hoc.

Another issue is that checking if a *mut T can be cast to a WellFormed<T> is non trivial, and would give different results depending on obscure differences between architectures.
In comparison, the check for Align<T> is simply ptr != 0 && ptr % align_of::<T>() == 0 (or, if size niches exist, the slightly more complicated ptr.wrapping_add(size_of::<T>()) <= size_of::<T>() && ptr % align_of::<T>() == 0), which only depends on a few parameters (size of the address space, and size/alignment of T).

Dec 08 '21 20:12 moulins

I don’t think this should be considered sound. If applied to ZSTs used as proof tokens, it would render them toothless, especially if they are Copy. And yes, I have a use case for such.

More philosophically, you should not be allowed to pull references out of thin air just because you don’t need to actually access any memory when dereferencing them. To claim the existence of a value at a given address you have to have sound knowledge that memory has been allocated and a value has been placed there. I don’t think it makes any difference that the extent of the allocation is empty and ‘placing’ the value happens to be a no-op at the machine code level. Someone must have had put it there, at least conceptually. For such a claim to be valid, it has to point to a place where values can ever possibly reside. To put it differently, even if ZSTs can reside anywhere in the address space, it doesn’t mean the address space has to extend throughout the entire range of the usize type. This is no different from singling out the zero address as invalid.

Though on the other hand, singling out ZSTs wouldn’t necessarily be so unprincipled, given that we had proposals like #2040 and #2754.

I suppose Vec<_> could equally well point empty vectors to some kind of static EMPTY_VEC: [(); 0] = [];. Or simply ‘dangle’ pointers somewhere else in the address space. Though if there is more such code in the wild, then it indeed is a concern.

As for the simplicity argument, adding a zero page reservation would add only one parameter as well, the beginning of the address space: this would change ptr != 0 into ptr >= LOWEST_ADDR, and ptr.wrapping_add(size_of::<T>()) > size_of::<T>() (what you meant above, I suppose) into ptr.wrapping_add(size_of::<T>()) >= LOWEST_ADDR + size_of<T>(). Though yes, in the general case, it would have to be an intrinsic with a target-specific implementation.

Dec 08 '21 23:12 fstirlitz

A concern I'd have with Aligned<T> being &'unsafe T is that &'unsafe T, if I understand it correctly, has the same invariants as T in many cases. Specifically stuff like aliasing, and the range covered by the borrow, and (afaict) initialization of the pointee. This would limit the set of use cases greatly. They do seem similar though, so perhaps there is a way to solve both

Or perhaps not... In truth, I think I see far more uses for this than I see for &'unsafe T (even though I sympathize with the pain of working with pointers in unsafe code)..

Dec 09 '21 00:12 thomcc

You can pull a reference to a zero-sized Copy data type with a public constructor out of thin air (assuming it's at an aligned address).

If the constructor isn't public, or the type isn't Copy, then you're potentially breaking someone's invariants and that's gonna be a bad time.

Dec 09 '21 00:12 Lokathor

@fstirlitz to be clear, conjuring a ZST via dereferencing an arbitrary nonnull well-aligned pointer is sound, as far as the language is concerned. This isn't changing.

However, it is semantically identical to transmute::<(), ZST>, which is also sound, as far as the language is concerned.

Where the unsoundness enters, is that it's Wrong ("library unsound") to violate privacy and create a type not through the APIs provided to construct it. If someone writes unsafe to violate your library requirements, they've committed an unsound act, and you're free to execute UB in your library with that as proof.

Back on the topic of ptr::WellFormed<_>... the problem is that the invariants of ptr::WellFormed<_> are basically that the pointer points to a place that was valid to dereference previously, but not necessarily now. (IOW, this is what &'unsafe means to me, I think.) The pointer having been previously valid is the only way that you can know for sure that it is well formed for dereferencing, including whatever interesting extra rules the target may have for valid pointers (to nonzero bytes), such as normal form or the zero page.

I'd love to be proven wrong in the future, but I don't really see all that much value for a compiler-known, niche-optimized pointer type between "known nonnull and well aligned" (ptr::NonNull) and "valid reference with programmer-checked lifetime" (&'unsafe). I do see benefit to having both of these.

Dec 10 '21 00:12 CAD97

There's certainly some value in having "all the invariants of a reference but stop bugging me about the lifetime".

In other words, it's always aligned, and if the target location is still live then it does contain initialized data in a valid state.

Dec 10 '21 01:12 Lokathor

Last time I tried this, making the layout computation of &T depend on the alignment of T completely broke the compiler due to query cycles. What's the proposed implementation strategy for this RFC?

Dec 10 '21 21:12 jonas-schievink

Last time I tried this, making the layout computation of &T depend on the alignment of T completely broke the compiler due to query cycles.

I tried something similar last year, but instead of trying to make it depend on specific Ts I tried to implement

I remember people some time ago floating the idea of adding yet more pointer niches by recognising the fact that under some ABIs, certain pointer ranges are reserved and cannot possibly be allocated addresses

It may be enough to special-case references to ZSTs to not have these requirements, but this seems pretty ad-hoc.

By doing a conservative layout analysis of T to see if we could be certain that it has a non-zero size (e.g. it being an aggregate that contains at least one scalar field, an enum with at least one such variant etc. etc.). Anything that would require another query was considered as potentially zero-sized.

I got as far as compiling stage-1 core and alloc but it blew up when compiling std somewhere, if I recall correctly, in LLVM normalizing a pointer subtraction to a wrapping add (meaning the offset was > isize::MAX) and then failing an LLVM assertion that the normalized version was out of range for the data type (since I redefined references to be only valid for 1..isize::MAX).

I didn't investigate further whether that was a bug in LLVM or my attempt was generating nonsensical IR. I have no guess how easy or difficult it would be to fix that issue.

A similar conservative approach might be possible for alignments by trying to determine the minimum knowable alignment for a type instead of the exact alignment.

Dec 11 '21 20:12 the8472

Back on the topic of ptr::WellFormed<_>... the problem is that the invariants of ptr::WellFormed<_> are basically that the pointer points to a place that was valid to dereference previously, but not necessarily now. (IOW, this is what &'unsafe means to me, I think.) The pointer having been previously valid is the only way that you can know for sure that it is well formed for dereferencing, including whatever interesting extra rules the target may have for valid pointers (to nonzero bytes), such as normal form or the zero page.

After some thinking, this actually seems a workable solution: add the WellFormed<T>/&'unsafe T type, with your "must point to a maybe-deallocated memory location valid for a T" invariant. This implies the following:

a WellFormed<T> is always non-null;
a WellFormed<T> is always well-aligned;
ptr.offset(1) is always sound; in particular, it doesn't wrap around the address space.

These three properties are enough to exploit alignment and size niches.

One advantage of WellFormed<T> over Aligned<T> is that because they can only be constructed from previously-valid references, we can transfer over whatever invariants we require of references without having to make them explicit.
This means that we don't need to commit to some specific bit-level validity invariants (alignment, etc) for WellFormed<T>, and that adding more in the future (restricted address ranges, etc...) isn't a breaking change.

WellFormed<ZST> created from thin air are still sound, because the language explicitely guarantees that any non-null, well-aligned pointer is a valid allocation for a ZST, so any such constant pointer can be soundly casted to a WellFormed.

The only issue is NonNull::dangling; a WellFormed::dangling equivalent is desirable, but it doesn't respect the "must be created from a existing allocation" requirement. A way to resolve this would be to define dangling as returning a pointer to a logically preexisting-but-deallocated region, but I don't know if this actually makes sense.

Last time I tried this, making the layout computation of &T depend on the alignment of T completely broke the compiler due to query cycles. What's the proposed implementation strategy for this RFC?

I must admit I didn't think that far yet. :smiley: I have never worked on rustc, so I have no idea of the gory details of such a change.
However, I suspect adding niches only for Aligned<T>/Unique<T> to be easier, because they are actual types defined in libcore, not compiler intrinsics. This would have the bizarre consequence that Box<T> has more niches than &T though, which may break some assumptions elsewhere in the compiler.

Dec 12 '21 02:12 moulins

ptr.offset(size_of::<T>())

Minor nit, offset takes units, not bytes

Dec 12 '21 02:12 Lokathor

Last time I tried this, making the layout computation of &T depend on the alignment of T completely broke the compiler due to query cycles. What's the proposed implementation strategy for this RFC?

I must admit I didn't think that far yet. 😃 I have never worked on rustc, so I have no idea of the gory details of such a change. However, I suspect adding niches only for Aligned<T>/Unique<T> to be easier, because they are actual types defined in libcore, not compiler intrinsics. This would have the bizarre consequence that Box<T> has more niches than &T though, which may break some assumptions elsewhere in the compiler.

I don't actually know for sure, but from what I've heard about how rustc's internals are structured, I'd expect the issue to not be that &T is builtin and WellFormed<T> is defined in the library, but instead that you could define a type like so:

pub enum MyType {
    A(WellFormed<MyType>),
    B,
    C,
}

For MyType, to calculate the size and alignment, you have to know if B and C can fit in A's niches, which requires calculating WellFormed<MyType>'s niches, which requires calculating MyType's size and alignment all over again, forming a loop.

Dec 12 '21 03:12 programmerjake

As an extreme example, let's assume that we're on a 16-bit architecture (pointers are 16-bits) for simplicity:

pub enum MyEnum1<'a> {
    Ref(&'a MyEnum2<'a>),
    Z = 0,
    P1 = 1,
    N1 = -1,
    N2 = -2,
    N3 = -3,
}

pub enum MyEnum2<'a> {
    Ref(&'a MyEnum1<'a>),
    Z = 0,
    P1 = 1,
    N1 = -1,
    N2 = -2,
    N3 = -3,
}

There are two possible equally valid sets of layouts:

MyEnum1 has size=4 and MyEnum2 has size=2 with MyEnum2's other enumerants fitting into &MyEnum1's niches which extend from -4 to -1 for the size niches as well as -1 to 1 for the alignment niches. MyEnum1 has more enumerants than niches in &MyEnum2 so there's a separate i16 tag field in MyEnum1 making it take 4 bytes rather than 2.
The reversed size assignments: MyEnum2 has size=4 and MyEnum1 has size=2 with the rest of the logic being analogous to the above case.

Dec 12 '21 03:12 programmerjake

@programmerjake's example holds for size niches, but can a similar example be created for alignment niches? I don't think so... though this would still require splitting alignment and size calculation, so I think it could be strictly ordered alignment→niches→size, if only alignment and not size niches are considered.

Even in the case of recursive requirements, it's solvable, if not optimally. Niche optimization is an optimization, so it's fine to miss an optimization in some cases (if not ideal). "Just" (yes it's more complicated) break cycles by picking some types and not giving their references any size/align niches. (In order to assure the computation is deterministic and gives the same answer no matter what part of the dependency cycle is queried first, it should be sufficient to pessimize types in a total ordering, or even just to pessimize the whole cycle.) This wouldn't ever give worse results than current, without the extra niches.

Dec 12 '21 07:12 CAD97

They might as well be both 2 bytes: with variant Z having bit representation 0x0000, P1 being 0x0001, N1 being 0x0003, N2 being 0x0005, N3 being 0x0007. There is no rule that the discriminant must be directly visible in the bit representation.

Dec 12 '21 08:12 fstirlitz

As for dangling: I thought the point of the niche optimisation is to remove the need for such a thing. But in the worst case, it could be simply declared valid by fiat.

Dec 12 '21 08:12 fstirlitz

N2 being 0x0005, N3 being 0x0007

As currently speced, the niche is only ptr < size_of::<T>() || size_of::<T>() < ptr (off by one on those?). As currently implemented, rustc's niche system only supports valid_scalar_range_start and valid_scalar_range_end to restrict a value to a smaller continuous range. rustc does not currently support more complicated sub-object/sub-byte niching, and adding support for such niches will make niching much more complicated and make the space/time trade-off much less of a clear win.

The example as given exhibits dual-minima with simple continuous range niching. The situation still exists with full alignment niche exploitation, even if it's significantly harder to hit -- just add a lot more variants -- and as such needs to be handled.

Dec 12 '21 22:12 CAD97

I'd like to add a comment here relative to my experiments relative to the efficiency of niches.

It all started because I wanted to implement the use of niche for enum with more data member. (What is mentioned in the "Future possibilities" of this RFC.) The PR was https://github.com/rust-lang/rust/pull/70477 , but the benchmarking saw some perf regressions in the compiler and then i did not have the time to continue this experiment. Anyway, my theory is that the compilation is slower when using niche because the generated LLVM code when using niche is not optimal, and lead to code that is both slower to compile, and slower. I made an experiment in here: https://github.com/rust-lang/rust/pull/77816 this disable the niche optimization for all the structures bigger than 16bytes. And the result is that for many benchmark, it if faster not to use the niche.

Anyway, what's relevant for this RFC, is that before enabling more niches, one should probably try to actually make sure that the niche optimization generate decent code (I summarized some possible improvement in https://github.com/rust-lang/rust/pull/75866#issuecomment-702967895 )

Dec 13 '21 12:12 ogoffart

Last time I tried this, making the layout computation of &T depend on the alignment of T completely broke the compiler due to query cycles. What's the proposed implementation strategy for this RFC?

Yeah this was an issue last time we tried that (years ago, I think it was in 2018).

But more importantly, we had concluded back then that transmuting from Foo<&T> to Foo<&U> (given T and U both Sized) should always succeed, and thus we can't rely on alignment of reference types for niches.

Dec 13 '21 14:12 nox

But more importantly, we had concluded back then that transmuting from Foo<&T> to Foo<&U> (given T and U both Sized) should always succeed, and thus we can't rely on alignment of reference types for niches.

Isn't this already false in the general case? e.g.

trait Tr {
  type Assoc;
}

struct A;
impl<'a> Tr for &'a A {
  type Assoc = u8;
}

struct B;
impl<'a> Tr for &'a B {
  type Assoc = u16;
}

struct Foo<T: Tr>(T::Assoc);

fn oh_no(arg: Foo<&A>) -> Foo<&B> {
  std::mem::transmute(arg)
}

Dec 13 '21 17:12 moulins

Isn't this already false in the general case? e.g.

Sorry I meant casting from some type to another type where the difference is that you have &T on one side and &U on the other. But I misremembered anyway.

The main issue is circular queries as @jonas-schievink said.

Currently rustc computes the layout and the required alignment of the type in the same query (right?), so you need to disconnect those two first. I don't think it is unfeasible (as layout optimizations don't change alignment requirements), but it's not a tiny task at the very least.

Dec 13 '21 17:12 nox

To solve the circular queries issue without having to explicitly detect and break cycles, I propose separating the layout computation in three phases:

First phase
- Compute the (exact) alignment (align) by taking the maximum alignment on each field (including enum discriminants).
- References have a fixed alignment, so there are no cyclic queries here.
Second phase
- Run the layout algorithm, but allow an arbitrary number of enum discriminants to fit in any non-empty niche. Here, we treat references as only having the null niche; the exact number of niches doesn't matter because any niche can accept any number of discriminants.
- This gives us a lower bound on the size of the type (min_size).
Third phase
- Compute the actual layout for the type with normal niche filling, using align and min_size to determine the exact niches for reference types.

Executing this algorithm on @programmerjake's example gives align=2, min_size=2, size=4 for both enums.

In general, this algorithm will pessimize the size niches for every enum containing a variant with a niche, but where niche optimization didn't apply because the niche wasn't big enough.

And the result is that for many benchmark, it if faster not to use the niche.

@ogoffart This is somewhat infortunate; the situation here may be different because the optimization only applies to enums with a single data-full variant, but again maybe not.
Anyways, even if the optimization isn't implemented for the moment, I believe adding Aligned<T>/WellFormed<T> is still useful, so that we can enable the optimization once the performance concerns are resolved.

Dec 13 '21 18:12 moulins

As far as I understand, the fact that not using niches is often faster is due to the complexity of the computation of discriminants, which can also be optimised. That's what I'm working on on my free time these days.

Dec 14 '21 09:12 nox

A simpler partial approach is to take advantage of the fact that all OSes for systems with an MMU don't map a page at 0 (so that null pointer accesses will generate a page fault), and thus values from 0 to page_size-1 (where page_size is usually at least 4KB) are not valid pointers and can be used as a niche.

On systems without an MMU this might not be the case, but if the system is 32-bit or more then usually the highest addresses are unused since they are past the end of RAM.

This is OS dependent, but Rust already has OS dependent code generation, so not a big issue.

Of course adding all unaligned values to this makes things even better (but it has no effect if the pointer is to a type with alignment 1), although the limited approach of adding only the unaligned values around 0 proposed by the RFC seems pointless, since it will take more work to implement and provide less benefit than taking advantage of the zero page since types usually have alignments lower than the page size.

Jan 15 '22 03:01 bill-myers

rfcs rfcs copied to clipboard

RFC: Alignment niches for references types.

rfcs
rfcs copied to clipboard