rfcs
rfcs copied to clipboard
Add `core::ptr::assume_moved`
Add a helper for primitive pointer types to facilitate modifying the address of a pointer. This mechanism is intended to enable the use of architecture features such as AArch64 Top-Byte Ignore (TBI) to facilitate use-cases such as high-bit pointer tagging. An example application of this mechanism would be writing a tagging memory allocator.
Is there a reason we can not just say changing the upper bits has no impact on a pointer if an appropriate tagging scheme is available, without need for additional methods?
Yes, the reason is that even if the hardware understands a particular tagging scheme, the memory model in Rust and LLVM does not. Setting a tag on a pointer, even though it has no impact on the hardware side, makes the memory model think the pointer has now been offset outside of its original allocation and thus any access to it is Undefined Behaviour.
To be able to do this we need a helper method that simulates a "realloc" from the untagged address to the tagged address to make the memory model happy. Specifically, we need the helper method to return a pointer that LLVM IR will annotate as noalias. Relevant section from the linked LLVM doc:
On function return values, the noalias attribute indicates that the function acts like a system memory allocation function, returning a pointer to allocated storage disjoint from the storage for any other object accessible to the caller.
Cc @rust-lang/opsem
these should probably be associated functions, not methods.
also, this seems to ignore another type of pointer tagging, often used by interpreters, where the bottom bits (otherwise always zero because of alignment) are used to tag the type of the object.
Is there a reason we can not just say changing the upper bits has no impact on a pointer if an appropriate tagging scheme is available, without need for additional methods?
This question should indeed be answered in the RFC text, not just in the discussion thread.
(This RFC could have benefited from a pre-RFC phase, posting it on the forum to get some feedback to ensure that it has all the expected details.)
Why can't this be done by the backend?
ie. I write my code as *(ptr & mask), and then the backend optimizes that to *ptr if it's known that the CPU automatically ignores the bits that were masked out? (This has the obvious benefit that the code is automatically portable to other architectures without that feature...)
To be able to do this we need a helper method that simulates a "realloc" from the untagged address to the tagged address to make the memory model happy.
Is simulating a realloc correct, though?
Consider the following code:
void *original = /*...*/;
void *copy = original;
void *tagged = realloc(original, ...);
Now, according to the semantics of realloc, it is guaranteed that tagged has no alias, and indeed this is the reason why at LLVM level the noalias attribute is specified.
However, if my understanding of TBI is correct, this is not what happens here.
Specifically, copy and tagged are alias of each others! And if codegen assumes that update through copy do not modify what tagged points to (or vice-versa), we'll have Undefined Behavior.
Am I misunderstanding TBI or noalias?
To be able to do this we need a helper method that simulates a "realloc" from the untagged address to the tagged address to make the memory model happy.
Is simulating a
realloccorrect, though?Consider the following code:
void *original = /*...*/; void *copy = original; void *tagged = realloc(original, ...);Now, according to the semantics of
realloc, it is guaranteed thattaggedhas no alias, and indeed this is the reason why at LLVM level thenoaliasattribute is specified.However, if my understanding of TBI is correct, this is not what happens here.
Specifically,
copyandtaggedare alias of each others! And if codegen assumes that update throughcopydo not modify whattaggedpoints to (or vice-versa), we'll have Undefined Behavior.
you have UB if you try to do any accesses through original or anything derived from it, a realloc essentially marks original as deallocated memory inside the compiler. so it is still noalias since after the realloc, the only valid pointer is tagged, even though you're just changing the pointer tag.
@matthieu-m this model definitely makes some code UB that would be correct when using TBI in an assembly program. However, we have to impose some restrictions to make TBI compatible with higher-level language models such as Rust (and the same goes for C and C++). realloc is the best plan we came up with so far -- and yes, this means that after choosing a new tag, all previous pointers to this memory are now invalid. Including the ones that used the same tag! This operation returns a fresh provenance, and all future accesses must be done with pointers that are derived from the pointer returned by with_tag.
The discrepancy caused by LLVM (and Rust) not understanding the concept of TBI is fairly unfortunate.
I think it should be noted in Future Possibilities that the choice of using a realloc-like method for now is future-compatible with LLVM and Rust gaining an understanding that only the bottom 56 bits of the pointer matter, and that when they do the constraints could be relaxed -- if we so wish -- to allow original & copy to still be valid (and aliased).
That is, while overly restrictive today, the drawback of the selected model is not painting us into a corner as far as I can see.
realloc is the best plan we came up with so far
Why is this better than explicitly masking off the bits and then having that mask be optimized away?
gaining an understanding that only the bottom 56 bits of the pointer matter,
Well, sometimes they get ignored, and sometimes all bits matter. This seems highly non-trivial, but I am not an expert on the relevant LLVM passes.
Why is this better than explicitly masking off the bits and then having that mask be optimized away?
That also sounds like an option, if LLVM supports it.
That also sounds like an option, if LLVM supports it.
It seems like LLVM knows about it, but doesn't currently have a pass that optimizes for it: https://github.com/search?q=repo%3Allvm%2Fllvm-project%20UseAddressTopByteIgnored&type=code
Seems like it would make more sense to add this functionality to LLVM rather than Rust though.
(This RFC could have benefited from a pre-RFC phase, posting it on the forum to get some feedback to ensure that it has all the expected details.)
Indeed, I should have at least marked it as draft from the get-go, or started with the forum as you suggest. This was intended as a conversation starter, it's by no means a ready proposal. I'm fully expecting to re-write this with more information, just want to get some outside opinions and fresh eyes on the direction first.
Why can't this be done by the backend?
ie. I write my code as
*(ptr & mask), and then the backend optimizes that to*ptrif it's known that the CPU automatically ignores the bits that were masked out? (This has the obvious benefit that the code is automatically portable to other architectures without that feature...)
That does sound like something that could be a useful LLVM pass, especially for compatibility with different platforms. But I think that's a different aspect from the use-case that this PR is meant to support. What we want here is a "Rust-way" to do the following:
let addr = &value as *const _ as usize;
let tag = 60;
let tagged_addr = addr | (tag << 56);
let ptr = tagged_addr as *const i32;
let val = unsafe { *ptr };
The snippet above will currently compile & work "fine" on a TBI system, except that Miri will rightly complain that the code has UB. The end goal of this proposal is to create an interface for top-byte tagging that does not break the memory model. This is separate from making those always safe to dereference, for which the LLVM pass would be helpful.
@mrkajetanp
But I think that's a different aspect from the use-case that this PR is meant to support.
It's not different. Methods already exist to do what you are trying to do:
fn mask_addr(addr: usize) -> usize {
addr & 0xFFFFFFFFFFFFFF
}
let tag = 60;
let tagged_ptr = (&value as *const _).map_addr(|addr| addr | (tag << 56));
let val = unsafe { *tagged_ptr.map_addr(mask_addr) };
This is completely sound under MIRI and doesn't require any extensions to the Rust abstract machine. The only bit that's missing is a compiler optimization that erases the .map_addr(mask_addr) on platforms where that is a no-op.
This is completely sound under MIRI and doesn't require any extensions to the Rust abstract machine
If I'm understanding what you're suggesting correctly, under this model the users would need to write out ptr.map_addr(mask_addr) for every single pointer access, no? Because without that even on a platform that ignores those bits Miri will complain as within the memory model ptr is currently pointing outside of its allocation.
If this was done inside some allocator wrapper (as it is currently being used in Android for instance) then every single pointer returned to the user would be UB to access unless the user explicitly masked those bits out. Surely that can't be a good approach?
@RalfJung Should I then re-write this based on the already received comments and then post on Rust Internals?
If this was done inside some allocator wrapper (as it is currently being used in Android for instance) then every single pointer returned to the user would be UB to access unless the user explicitly masked those bits out.
Fair enough - the proposed API seems a bit too high level for this allocator use-case though? Wouldn't the primitive operation be something like "realloc" but where you specify the target address? And there doesn't need to be an explicit tag() method, since you can always safely access the address of a pointer.
Wouldn't the primitive operation be something like "realloc" but where you specify the target address?
It certainly could be if that's the community consensus, I don't have very strong views on what the exact API should look like - my intention when posting this was to get opinions on that exact question. Next time around I'll go through Internals first, I suppose I took the request for comments term a bit too literally for how it's used here :)
And there doesn't need to be an explicit tag() method
True as well, the intent there is just for convenience. Because different architectures can use different bits for the tagging it'd make sense to have a corresponding tag() method just so that the user can set and retrieve tags without having to write code for a specific architecture.
If we just want to support something like realloc(target_addr) but for tagging then the explicit tag() method is not needed as we're leaving it up to the user to work out the specific bits to get and set anyway.
I also think a lower-level API that focuses on the realloc-like operation is better, but I am not a t-libs-api member. I also sadly don't have the capacity to be much further involved in this. I think I gave some good starting points for how the RFC could be improved, and clarified what we generally expect from RFCs. Posting an improved version to IRLO sounds like a good plan. :)
The realloc approach would also support cases where virtual memory mappings are used for a similar purpose on platforms without hardware support for pointer tagging (ie. where you map the same physical memory to two or more virtual address ranges).
The realloc approach would also support cases where virtual memory mappings are used for a similar purpose on platforms without hardware support for pointer tagging (ie. where you map the same physical memory to two or more virtual address ranges).
I don't think Rust will have standard APIs for manipulating page tables? ;)
I was going to say, I don't think this is ready yet for a portable API. It makes little sense to try and sketch a portable API that has exactly one target implementation. The RFC should focus in providing APIs for platform-specific capabilities, e.g. in core::arch. A portable API can be experimented with as a user crate, since some experimentation will be required before it becomes clear what a good API looks like.
I don't think Rust will have standard APIs for manipulating page tables? ;)
that doesn't matter if you can still use mmap or similar to make two mappings for the same piece of memory -- it would be nice if rust can handle that case.
This is kinda similar to how Rust doesn't have a std thread API on some targets (because they're #![no_std]), but Rust still needs to properly handle running code in different threads that were started by some mechanism outside of the Rust standard library (unless the target is specifically single-threaded only, such as wasm32-unknown-unknown)
that doesn't matter if you can still use mmap or similar to make two mappings for the same piece of memory -- it would be nice if rust can handle that case.
mmap is an opaque operation to Rust. If you use it to relocate an allocation, you can already treat it like a realloc. You just have to make sure that you stop using the old pointer after the realloc, and instead use the one returned by mmap.
that doesn't matter if you can still use mmap or similar to make two mappings for the same piece of memory -- it would be nice if rust can handle that case.
mmapis an opaque operation to Rust. If you use it to relocate an allocation, you can already treat it like a realloc. You just have to make sure that you stop using the old pointer after the realloc, and instead use the one returned bymmap.
i meant that you'd mmap the exact same memory to two locations and then use the realloc intrinsic to access both of them without any further mmap calls needed:
https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=27b03de9ba18d73a1e60badc1e5b3267
The realloc-like interface could be neat in that it would let us avoid all the otherwise present difficulties with making the interface portable. With a signature like say (name TBC) fn simulate_realloc<T>(mut original: *mut T, new_address: usize) -> *mut T, it would work for any target from the get-go and we wouldn't have to worry about the particular platform's tagging scheme.
Then in core::arch::aarch64 we could put something like:
fn ptr_with_top_byte<T>(ptr: *mut T, top_byte: u8) -> *mut T {
let new_addr = ptr as usize | top_byte << 56;
simulate_realloc(ptr, new_addr)
}
And similar for other platforms as needed. How does that sound?
I've now posted a re-written version of this here: https://internals.rust-lang.org/t/pre-rfc-core-simulate-realloc/21745
i meant that you'd mmap the exact same memory to two locations and then use the realloc intrinsic to access both of them without any further mmap calls needed:
Yeah, that could be done with an intrinsic like the one backing pointer tagging.
The realloc-like interface could be neat in that it would let us avoid all the otherwise present difficulties with making the interface portable. With a signature like say (name TBC) fn simulate_realloc<T>(mut original: *mut T, new_address: usize) -> *mut T, it would work for any target from the get-go and we wouldn't have to worry about the particular platform's tagging scheme.
Yes that is roughly what I had in mind when suggesting a realloc-like interface. :)
Updated the RFC based on the discussion on internals: https://internals.rust-lang.org/t/pre-rfc-core-simulate-realloc/21745