crossbeam Soundness of AtomicCell::compare

AtomicCell<T>::compare_exchange will, if I understand the code correctly, use AtomucU*::compare_exchange_weak if T has the right size. But this means that e.g. for (u8, u16), which has size 4, it will transmute these pairs into u32 and compare them as integers. However, such pairs contain a padding byte, meaning that the u32 contains some uninitialized memory. Having uninitialized memory in a u32 is uncharted territory at best, but performing actual operations on such a datum (as opposed to just load/store) brings us very close to UB. I am not sure what the rules are in LLVM for this case, but under the proposed poison semantics I think the comparison will return poison and hence using it in an if is UB. For Rust, I think the intention is to make any operation on uninitialized data UB (like it is in C), meaning the comparison would already return UB.

Strictly speaking, this affects not just the fast path, but also the arbitrarily-sized case: bytes_eq might compare uninitialized bytes.

Jan 30 '19 10:01 RalfJung

But this means that e.g. for (u8, u16), which has size 4, it will transmute these pairs into u32 and compare them as integers.

If you don't mind me asking, is the transmute itself UB as well? In particular, I had thought that the in-memory representation of tuples isn't guaranteed? Or is it just the ordering of fields that isn't guaranteed?

Jan 30 '19 10:01 BurntSushi

Well, the code checks that the sizes match. Ignoring complicated issues around pointers, the only way in which transmuting a 4-byte piece of data to u32 is UB is if there are bit patterns that are not valid for u32 -- which boils down to the question of whether uninitialized bytes are allowed in u32.

Jan 30 '19 10:01 RalfJung

It seems C/C++ standard committee members tend to agree that all bit patterns are valid for u32 and other unsigned integer types: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r4.html "For unsigned integer types, the bits of the object representation are ..." I think it's reasonable, because (once we agree to use two's complement) there's no other sane choice.

Jan 30 '19 11:01 jeehoonkang

@RalfJung That sounds overly restrictive to me.

IMO, it should be possible to transmute a value of any type T into [u8; size_of::<T>()], read those bytes, and transmute back into T without UB. I would bet a lot of C and Rust code has been written under that assumption.

Jan 30 '19 11:01 ghost

@stjepang To be very pedantic (sorry!), u8 is special in that everything can be legally cast to an array of u8 in Rust or char in C/C++. But u32 is not special in that sense, at least according to the C/C++ standard :(

And yet I agree with your sentiment: I believe representing an object of any type as u32 or any other unsigned integer types should not be UB. Edit: in other words, u8 should not be special at the first place.

Jan 30 '19 11:01 jeehoonkang

I am not aware of any precedent for u8 being special in Rust, and in fact all the discussion of the invariants of integer types I'm aware of treats all integer types alike. It's true that in C and C++, char and its signed/unsigned variants are very special, but that is precisely one of the thorny parts of writing unsafe code in those languages that Rust should try to smooth out.

Jan 30 '19 11:01 hanna-kruppe

Thanks for the clarification! I agree with you and would expect u8 to not get special treatment over other integer types. It's all just bytes to me.

Jan 30 '19 11:01 ghost

This is Rust, not C++. :)

"For unsigned integer types, the bits of the object representation are ..." I think it's reasonable, because (once we agree to use two's complement) there's no other sane choice.

Remember that every bit can be 0, 1 or uninitialized. Even the C++ standards committee acknowledges this, though just indirectly -- namely, it acknowledges that "indeterminate values" can be "unstable" in the sense that x == x on an integer can be false. This can only be the case if there are states beyond 0 and 1. I agree every sequence of 0 and 1 is fine in u32. But for uninitialized, it is far from reasonable to allow such bits in a u32. (And indeed I picked u32 as an arbitrary example here, the rules should be the same for all integer types.)

If we allow that, we have to answer other hard questions like what happens when you do arithmetic or comparison involving uninitialized bits. That's why I think that any uninitialized bits (outside of padding) should only exist in a MaybeUninit -- that is, a [MaybeUninit<u8>; N] can represent any data, but a [u8; N] cannot. There is a conflict between permitting arithmetic and other operations on a type, and permitting it to hold "strange bits" (uninitialized bits, but also transmuted pointers can be a problem here) -- and since u8 permits arithmetic, it should not permit "strange bits".

It's all just bytes to me.

But bytes are complicated.

Jan 30 '19 12:01 RalfJung

@RalfJung Fair enough, this is difficult. :(

It might be helpful to look at what C/C++ do on atomic compare-exchange with arbitrary (non-POD) data. This is interesting:

The comparison and copying are bitwise (similar to std::memcmp and std::memcpy); no constructor, assignment operator, or comparison operator are used.

When a weak compare-and-exchange would require a loop and a strong one would not, the strong one is preferable unless the object representation of T may include padding bits, (until C++20) trap bits, or offers multiple object representations for the same value (e.g. floating-point NaN). In those cases, weak compare-and-exchange typically works because it quickly converges on some stable object representation.

For a union with bits that participate in the value representations of some members but not the others, compare-and-exchange might always fail because such padding bits have indeterminate values when they do not participate in the value representation of the active member.

Padding bits that never participate in an object's value representation are ignored.

Source: https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange

Boost atomics will do reinterpret_cast just like AtomicCell (source). However, on MSVC-8 and MSVC-9, spinlocks are used because otherwise the compiler would miscompile code (source).

Jan 30 '19 13:01 ghost

Yeah, I figured this would be related to memcmp but unfortunately I was not able to find anything precise about memcmp on padding bytes. We have two kinds of problems here:

Is the LLVM IR we currently generate for this valid, or does it have UB? The atomic comparison and byte_eq are working with poisoned data, so I am not sure. We could argue that atomic reads are frozen (which is an assumption AtomicCell already makes for volatile reads), which would help for the atomic comparisons but not for the fallback case (the data compared by byte_eq is read with a normal, non-atomic non-volatile load). However, the C++ spec you are quoting makes it sound like it is not at all clear that atomic reads are frozen...
Beyond the LLVM rules, are there extra Rust rules this violates? This is where the part about having uninitialized bits in a u32 comes in. I'd prefer if this was all MaybeUninit<u32> but of course we don't have atomic operations on those...

Jan 30 '19 13:01 RalfJung

@Amanieu @stjepang with the newly proposed ptr::freeze and if you remove AtomicCell::get_mut, I think this can be made sound. You basically just have to make sure that you only ever write frozen stuff into the cell, and freeze things before you compare them. In contrast to the "zero padding"-based schemes @Amanieu suggested at the all-hands, this does not require a special function that knows where the padding bytes are.

Feb 11 '19 15:02 RalfJung

That doesn't exactly solve the problem though: compare_exchange returns a T with the current value in the atomic if the compare fails. In the most common use case (CAS loop) this will be passed back in as the current argument to compare_exchange in the next loop iteration.

However since the padding bytes in T are always undef, the freeze operation could freeze them into a different value than the one in the atomic, which would cause the next compare_exchange call to fail even though the value in the atomic was not changed since the last iteration. This could result in the CAS loop never converging to a stable state.

Feb 11 '19 16:02 Amanieu

@RalfJung Awesome news!

@Amanieu Are you sure? We have this if statement that should solve the issue you're describing. Note that compare_exchange will never fail spuriously due to the difference in undef bytes.

Feb 11 '19 16:02 ghost

@stjepang Ah right. So you won't get spurious failures, but you could in theory get stuck in an infinite loop.

Feb 11 '19 16:02 Amanieu

@Amanieu Hmm, not sure I understand how. If we do cell.compare_exchange(old, new) and we know cell.get() == old, the operation will always succeed, even if the two values differ in padding bytes.

Could you perhaps sketch out a simple example? I don't see the theoretical infinite loop here.

Feb 11 '19 16:02 ghost

Let us assume some 2-byte type (u8, __pad: u8). The first byte contains a valid value, the second byte is padding. The underlying atomic type used is AtomicU16.

The current value in the AtomicU16 is (7, 0).
The user calls cell.compare_exchange(old = (7, undef), new = (8, undef)).
The undefs get frozen as u16s with values: old = (7, 60), new = (8, 90).
The inner atomic_compare_exchange_weak fails because the padding bytes don't match.
The inner atomic_compare_exchange_weak returns Err((7, undef)) because it returns a T instead of a u16, the value is "unfrozen"
compare_exchange loops back to step 2 with old = (7, undef).

Now that I look at it this way, I think this is fixable if you keep the previous value as a u16 when passing it back up the loop instead of converting it back to a T.

Feb 11 '19 17:02 Amanieu

Thanks! I agree about keeping the previous value as a u16 rather than going u16 -> T -> u16. That should solve the problem.

Feb 11 '19 17:02 ghost

Ah good point, yes you need to do the entire loop without round-tripping through T.

It is a bit dissatisfying that we cannot also have get_mut, though... all we need to have that, too, is an "atomic freeze operation" -- a way to freeze memory as the first thing compare_exchange does, in a way that does not count as a data race in the memory model. That seems like a pretty powerful operation though, and I have no idea how to integrate it with the rest of the memory model.

Pragmatically speaking, an asm! block probably has this effect as far as LLVM is concerned. But that's not really good enough for me.^^

Feb 11 '19 19:02 RalfJung

Any thoughts on replacing get_mut with store_mut/load_mut which, IIUC, could still be implemented soundly?

Feb 21 '19 21:02 mtak-

These could freeze, so yes, they could be implemented in a sound way in this model.

Feb 21 '19 22:02 RalfJung

I believe (1) Rust should not blame padding bytes as invalidate values as well, and (2) we should de-deprecate AtomicCell::get_mut(). This idea is not just popping from my head, but is actually inherited from the precedence of C/C++.

Paddings are not invalidate but unspecified in C/C++. That means (1) you don't know what's the value of the paddings, but (2) computations on the paddings should not invoke undefined behavior. It is for a good reason. For example, padding bytes that do not introduce undefined behavior is essential in our implementation of AtomicCell::get_mut().

Mar 03 '19 13:03 jeehoonkang

AFAIK padding bytes are indeterminate in C/C++. That's the same status as uninitialized memory, and it is very different from unspecified. Whether or not computations on indeterminate values are UB is not clear from the standard, but:

In the case of an indeterminate value all bit-patterns are valid representations and the actual bit-pattern may change without direct action of the program.

Source: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1793.pdf

In particular, comparing indeterminate values with anything is meaningless, because they do not have a stable bit pattern. This is a problem for compare_exchange, which performs exactly this kind of comparison.

Also, another point not clear from the standard is what happens when you memcpy a 4-byte struct including a padding byte into a uint32_t -- does the entire integer value become indeterminate, or is this somehow tracked on a per-byte basis? From what I recall from the LLVM poison discussion, there are optimizations that are incompatible with either choice.

Mar 03 '19 13:03 RalfJung

According to C17 6.2.6p6, "padding bytes take unspecified values":

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

Mar 03 '19 14:03 jeehoonkang

Interesting. However, we also have C17 6.7.1 p9:

unnamed members of objects of structure and union type do not participate in initialization. Unnamed members of structure objects have indeterminate value even after initialization.

and 6.7.2.1 p15 + p17:

There may be unnamed padding within a structure object, but not at its beginning.

There may be unnamed padding at the end of a structure or union.

Also, to my knowledge, making padding indeterminate is needed to make SROA (scalar replacement of aggregates) a valid optimization.

Mar 03 '19 14:03 RalfJung

Also note footnote 51 in your quote is

Thus, for example, structure assignment need not copy any padding bits.

But that doesn't square with just being unspecified: if I have an uninitialized variable and initialize it by assignment, and the assignment does not copy the padding bits, then those bits will remain indeterminate.

Mar 03 '19 14:03 RalfJung

For unnamed members/paddings, I think the standard intentionally distinguished members and paddings: paddings are not members, and unnamed members are indeterminate and unnamed paddings are unspecified. Also, I think the footnote 51 should be interpreted as informative, but not normative. It probably suggests an implementation strategy.

I think SROA will be much more interesting than language laws. Would you please give me a reference or example why paddings should be indeterminate for validating SROA?

Mar 03 '19 14:03 jeehoonkang

I think that's just an artifact of wording -- padding is a kind of unnamed member. What other unnamed members would there even be?

Would you please give me a reference or example why paddings should be indeterminate for validating SROA?

Well, one effect of SROA is that only the elements of a struct get copied, not its padding -- exactly what the footnote alludes to.

struct Foo { uint8_t f1, uint16_t f2 };

Foo bar(Foo foo) {
  Foo foo2 = foo; // assume this one gets SROA'd
  return foo2;
}

Foo bar2(Foo foo) {
  Foo foo2;
  foo2.f1 = foo.f1;
  foo2.f2 = foo.f2;
  return foo2;
}

bar is basically transformed to bar2: it will return a Foo where the two fields got copied from the argument, but the padding was left uninitialized -- and hence is indeterminate.

Mar 03 '19 14:03 RalfJung

Unnamed fields are anonymous ones: https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html I still think paddings are not unnamed members mentioned in the standard.
I believe paddings are never indeterminate in C, so your SROA example is sound even though the padding bytes are unspecified values.

Mar 03 '19 14:03 jeehoonkang

Unnamed fields are anonymous ones: gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

How would it make any sense that those members do not get initialized?

I still think paddings are not unnamed members mentioned in the standard.

Given that padding is explicitly called out as "unnamed", I disagree with that reading.

Notice that this interpretation is not just mine, I took it from https://wiki.sei.cmu.edu/confluence/display/c/EXP42-C.+Do+not+compare+padding+data which explicitly quotes the above parts of the standard when talking about padding.

I believe paddings are never indeterminate in C, so your SROA example is sound even though the padding bytes are unspecified values.

I don't follow. Uninitialized variables are indeterminate. Why would their padding not be...?

The standard is generally extremely vague about all this indeterminate/unspecified/trap representation business. I think it is much clearer to follow the approach proposed for LLVM, where "indeterminate" becomes poison and there's a proper formal definition. Unintiailized memory is poison. The only way to explain my example above is to say that copying a struct resets all padding to poison.

Mar 03 '19 15:03 RalfJung

On "unnamed":

I think distinguishing unnamed members and unnamed paddings, is the only way to make sense of C17 6.2.6p6 and [C17 6.7.1 p9 and 6.7.2.1 p15 + p17] at the same time. Even if they conflict with each other, 6.2.6p6 is more clearly describing the semantics of padding with unambiguous words: padding bytes are unspecified. I prefer to follow it.
On uninitialized struct/union values:

According to C18, struct and union values, even if uninitialized, should not be a trap representation. (Here, "trap representation" means what we called "indeterminate" in this discussion: C18 3.18.2 says an indeterminate value is "either an unspecified value or a trap representation".) C18 6.2.6p6 says: "The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation."

Mar 04 '19 03:03 jeehoonkang

Soundness of AtomicCell::compare_exchange is dubious