cs431
cs431 copied to clipboard
[Question] 'Strength' comparision of SeqCst fence and SeqCst load/stores
Opening
I'm trying to understand the reason why hazard pointers require exactly two SeqCst
orderings. While thinking about this, I wanted to fully understand the guarantees of SeqCst
atomic operations and fences. Intuitively, fences should provide a stronger guarantee due to its name(the term fence gives a very intimidating impression!) and below are my conclusions:
Low-Level Perspective
The difference of a SeqCst
fence
and SeqCst
atomic.store
in a compiler's point of view:
-
fence
is lowered to amfence
instruction. Themfence
instruction is said to be a serializing instruction. This means that all stores prior to themfence
instruction are committed to the main memory. -
store
is lowerd to axchg rax, [rbx]
-like instruction. This is interesting, becauserax
is discarded afterwards. ForRelease
ordering, the store is not lowered to axchg
instruction but a regular store instruction. This implies that there is something 'special' aboutxchg
, which I don't know yet.
High-Level Perspective
-
All stores with the
SeqCst
ordering contribute to the global orderingS
. All loads with theSeqCst
ordering see stores accordig to the global orderingS
. The term see means the following: if a thread can see a store, it means that it can observe the effects of that store. -
However, this does not mean that a
SeqCst
load can see the last store withinS
. It only guarantees that if it sees a certain stores
withinS
, it can always seet
such thatt
<s
where '<' represents the ordering withinS
. In other words, threads see a 'prefix ofS
, which is at worst nothing (null sequence) and at best the entire sequenceS
. Let's consider the following example:
fn thread1_func(m: &Arc<AtomicUsize>, n: &Arc<AtomicUsize>) {
m.store(1, Ordering::SeqCst); // [1]
let val = n.load(Ordering::SeqCst);
if val == 0 {
read(x);
}
}
fn thread2_func(m: &Arc<AtomicUsize>, n: &Arc<AtomicUsize>) {
n.store(1, Ordering::SeqCst); // [2]
let val = m.load(Ordering::SeqCst);
if val == 0 {
free(x);
}
}
Let's assume that everything in thread2 was executed before thread1. Then, the global ordering S
would be [2] --> [1]
. However, the load of n
in thread1 can yield 0, if it sees 'nothing' from S
. Thus, this results in a UaF and crashes the program.
- We can use fences to solve this problem.
fn thread1_func(m: &Arc<AtomicUsize>, n: &Arc<AtomicUsize>) {
m.store(1, Ordering::SeqCst); // [1]
fence(Ordering::SeqCst);
let val = n.load(Ordering::SeqCst); // [3]
if val == 0 {
read(x);
}
}
fn thread2_func(m: &Arc<AtomicUsize>, n: &Arc<AtomicUsize>) {
n.store(1, Ordering::SeqCst); // [2]
fence(Ordering::SeqCst);
let val = m.load(Ordering::SeqCst); // [4]
if val == 0 {
free(x);
}
}
-
I understood the semantics of a
SeqCst
fence as follows: For stores, nothing changes. (assuming all stores had theSeqCst
ordering) For loads, an additional condition is imposed. If aSeqCst
fence is present before a load, it must see a subsequence of the global orderingS
that includes all of the stores before the fence. -
Let's see how the problem is solved in this case: Let's consider both cases, where the
S
is[1] --> [2]
andS
is[2] --> [1]
. -
S
=[1] --> [2]
: The load ofn
in thread1 ([3]
) either sees[1]
or[1]-->[2]
. If it sees[1]
, the load ofn
will return 0, andread(x)
will be performed. In the other case,read(x)
will not be performed. The load ofm
in thread2 ([4]
) must see[1]-->[2]
due to the fence. Thus, the load ofm
will return 1 andfree(x)
will never be performed. Becauseread(x)
is 'maybe' done andfree(x)
is never done, the pogram is safe! -
S
=[2] --> [1]
: The load ofn
in thread1([3]
) must see[2]-->[1]
due to the fence. Thus, the load ofn
will return 1 andread(x)
will not be performed. The load ofm
in thread2 ([4]
) either sees[2]
or[2]-->[1]
. If it sees[2]
, the load ofm
will return 0 andfree(x)
will be performed. In the other case,free(x)
will not be performed. In conclusion,read(x)
will never be performed so safety is guaranteed regardless of whetherfree(x)
is called. Safe! -
So in conclusion,
fence(SeqCst)
imposes some condition forload(SeqCst)
and limits how it 'sees'S
.
Connecting the Dots
I think it is also important to think why the usage of mfence
provides this high-level conditions that I described. Let's see the code before in pseudo-assembly.
thread1:
mov rax, 1
xchg rax, [rbx] // atomic store [1]
mfence
mov rax, [rcx] // atomic load [3]
thread2:
mov rax, 1
xchg rax, [rcx] // atomic store [2]
mfence
mov rax, [rbx] // atomic load [4]
The order between the two xchg
s can be arbitrary. But, by mfence
, the loads are done strictly after xchg
commits to memory. Thus, because there exists an order between xchg
s PLUS an order between xchg
and load, the high-level properties described previously are obtained.
Conclusions and Thoughts
If any of my discussions are wrong, please tell me where my thoughts are wrong. These are rather unbased thoughts of mine that were created during the process of attempting to understand what fence
s are.