Possible UB in the `pool::UnionNode<T>` and `pool::Node` types
As part of #562 I am working through the usage of unsafe code and I think I uncovered some UB in the pool module that affects Arc<T> and Box<T>. I would like it if someone could verify this as I am not entirely sure I understand the language documentation correctly.
There are two places which IMO produce UB, both relating to the usage of the UnionNode<T> type:
Obtaining an invalid reference to a union field in UnionNode<T>
First, this line right here in the implementation of the Node trait for UnionNode<T>:
impl<T> Node for UnionNode<T> {
type Data = T;
fn next(&self) -> &AtomicPtr<Self> {
unsafe { &self.next } // UB if `self.next` is not initialized, but we can't know from within this method
}
//...
}
As per the Rustonomicon, a reference pointing to invalid data is instant UB (see here). To get a valid reference to the self.next field, it must be properly initialized. Looking at e.g. the arc module, the nodes come from the ArcBlock<T> type, which is initialized like this:
impl<T> ArcBlock<T> {
/// Creates a new memory block
pub const fn new() -> Self {
Self {
node: UnionNode {
data: ManuallyDrop::new(MaybeUninit::uninit()), // TODO I think we want to initialize `next` instead of `data`!
},
}
}
}
Initializing data does not guarantee that next is also properly initialized, so I think the fix (as I put in the comments) would be to initialize self.next instead in ArcBlock::new(). As I see, we never need data to be initialized as its purpose is to be pushed into the stack of the ArcPoolImpl type, and once we pull it out of the stack we make sure data is initialized through a ptr::write.
The same should be done for BoxBlock::new() in the Box module. IMO it would also require to make the next() and next_mut() methods of the Node trait unsafe, as we otherwise cannot guarantee that obtaining a reference to the self.next field is sound.
Wrong assumptions on the memory layout of UnionNode<T>
The second problem is that there are several situations where a pointer to a UnionNode<T> is cast to T but we cannot be sure that it is valid to do so. An example is this line in ArcPool::alloc():
unsafe { node_ptr.as_ptr().cast::<ArcInner<T>>().write(inner) }
The Rust reference on union field layout states that "Fields might have a non-zero offset (except when the C representation is used)". Currently UnionNode<T> is not repr(C), so assuming that a *mut UnionNode<T> is also a valid *mut T is wrong I think. The fix here might be as simple as adding #[repr(C)] to UnionNode<T>.
I would be happy to implement these changes, maybe as part of the PR for #562, if it turns out that I am right. If I'm wrong I am also thankful for anyone pointing out why :)
Thanks reviewing the unsafe code!
Obtaining an invalid reference to a union field in UnionNode<T>
I don't know if this one has a bug. The specific API is is purely internal so I'm not sure it's possible to call that for a user of the library. It would still be good to make this function unsafe in the trait and make sure all calls to this function validate that it's properly initialised. Unless there is a specific case in the library where this is called incorrectly, I think it can be part of the PR for #562.
Wrong assumptions on the memory layout of UnionNode<T>
This one indeed looks like a bug. I would welcome a PR adding #[repr(C)] to the union.
You're welcome :) I submitted a PR for the #[repr(C)] issue.
As for the second issue, I will make the trait methods of Node unsafe, however I do think that as it stands, there is incorrect usage within the library itself. Take the typical usage of Arc<T>, which goes like this:
- You create an
ArcPoolImplusingarc_pool!, and allocate a newArcBlocksomewhere and initialize it withArcBlock::new - You pass that
ArcBlockto the pool usingArcPoolImpl::manage - This block is properly initialized for
UnionNode::data, as this is whatArcBlock::newdoes internally. This does not mean thatUnionNode::nextis also properly initialized ArcPoolImpl::managecallsStack::pushwith a pointer to thatUnionNode, which then callsUnionNode::next(), resulting in a reference toself.nextbeing created, which was never initialized
I think this is UB irregardless of the bit pattern of UnionNodes fields. I also found a statement on references to uninitialized data being UB in the section on UB in the Rust documentation. It feels like the exact situation that makes MaybeUninit::assume_init_ref UB if the underlying value was never initialized, even if I never do anything with the reference obtained this way.
Could you please confirm whether my observation is correct and comment on my proposed fix (changing the initialization within ArcBlock::new)?
Indeed that's a direct UB when creating a Arc block. I'm surprised that the miri tests don't catch it.
I think the proper fix is instead to have a method that returns a pointer through ptr::addr_of! instead of going through a reference. Looking at the usage it's to put it in a linked list so the reference is immediately converted to a pointer in the case that has UB. When it's popped, the correct assumption is always made.
So I would change the node trait to be the following (adding documentation of the lifetime the pointers are valid for).
pub trait Node: Sized {
type Data;
fn next(&self) -> * const AtomicPtr<Self>;
#[allow(dead_code)] // used conditionally
fn next_mut(&mut self) -> * mut AtomicPtr<Self>;
}
It deserves its own PR indeed.
I agree that this is probably the better API as it doesn't depend on correct initialization. I will implement this and create a PR for this as well.
@sgued Sorry to bother you again, but after looking at this some more, I think even your suggested change isn't enough unfortunately. The problem is the way the AtomicPtr is used inside of push in pool/treiber/cas.rs:
// line 192:
new_top
.non_null()
.as_ref()
.next() // <- This would return *const AtomicPtr<T> with the proposed change
.store(top, Ordering::Relaxed); //<- store takes &self (i.e. &AtomicPtr<T>), which we can't obtain without next being initialized properly
There is no way I see to call AtomicPtr::store in a way that is legal if the underlying pointer is potentially uninitialized. The current implementation of AtomicPtr::store takes a &self, defers to the inner atomic integer type which might come from the portable_atomic crate, where their implementation of store also takes a &self. But we can't get a &self from *const AtomicPtr unless the pointed-to value is properly initialized. ptr::write on the *const AtomicPtr is also out of the question, as that would bypass the actual atomic store operation (which, according to portable_atomic, might use locks).
This brings me back to thinking we just have to ensure that the UnionNode::next field is properly initialized when we pass the node to the stack.
I see. In that case always initialising it seems to make sense. Thanks for the analysis!