cs431 [Question] 'Strength' comparision of SeqCst fence and SeqCst load/stores

[Question] 'Strength' comparision of SeqCst fence and SeqCst load/stores

Open pr0cf5 opened this issue 2 years ago • 0 comments

Opening

I'm trying to understand the reason why hazard pointers require exactly two SeqCst orderings. While thinking about this, I wanted to fully understand the guarantees of SeqCst atomic operations and fences. Intuitively, fences should provide a stronger guarantee due to its name(the term fence gives a very intimidating impression!) and below are my conclusions:

Low-Level Perspective

The difference of a SeqCst fence and SeqCst atomic.store in a compiler's point of view:

fence is lowered to a mfence instruction. The mfence instruction is said to be a serializing instruction. This means that all stores prior to the mfence instruction are committed to the main memory.
store is lowerd to a xchg rax, [rbx]-like instruction. This is interesting, because rax is discarded afterwards. For Release ordering, the store is not lowered to a xchg instruction but a regular store instruction. This implies that there is something 'special' about xchg, which I don't know yet.

High-Level Perspective

All stores with the SeqCst ordering contribute to the global ordering S. All loads with the SeqCst ordering see stores accordig to the global ordering S. The term see means the following: if a thread can see a store, it means that it can observe the effects of that store.
However, this does not mean that a SeqCst load can see the last store within S. It only guarantees that if it sees a certain store s within S, it can always see t such that t < s where '<' represents the ordering within S. In other words, threads see a 'prefix of S, which is at worst nothing (null sequence) and at best the entire sequence S. Let's consider the following example:

fn thread1_func(m: &Arc<AtomicUsize>, n: &Arc<AtomicUsize>) {
    m.store(1, Ordering::SeqCst); // [1]
    let val = n.load(Ordering::SeqCst);
    if val == 0 {
        read(x);
    }
}

fn thread2_func(m: &Arc<AtomicUsize>, n: &Arc<AtomicUsize>) {
    n.store(1, Ordering::SeqCst); // [2]
    let val = m.load(Ordering::SeqCst);
    if val == 0 {
        free(x);
    }
}

Let's assume that everything in thread2 was executed before thread1. Then, the global ordering S would be [2] --> [1]. However, the load of n in thread1 can yield 0, if it sees 'nothing' from S. Thus, this results in a UaF and crashes the program.

We can use fences to solve this problem.

fn thread1_func(m: &Arc<AtomicUsize>, n: &Arc<AtomicUsize>) {
    m.store(1, Ordering::SeqCst); // [1]
    fence(Ordering::SeqCst);
    let val = n.load(Ordering::SeqCst); // [3]
    if val == 0 {
        read(x);
    }
}

fn thread2_func(m: &Arc<AtomicUsize>, n: &Arc<AtomicUsize>) {
    n.store(1, Ordering::SeqCst); // [2]
    fence(Ordering::SeqCst);
    let val = m.load(Ordering::SeqCst); // [4]
    if val == 0 {
        free(x);
    }
}

I understood the semantics of a SeqCst fence as follows: For stores, nothing changes. (assuming all stores had the SeqCst ordering) For loads, an additional condition is imposed. If a SeqCst fence is present before a load, it must see a subsequence of the global ordering S that includes all of the stores before the fence.
Let's see how the problem is solved in this case: Let's consider both cases, where the S is [1] --> [2] and S is [2] --> [1].
S = [1] --> [2]: The load of n in thread1 ([3]) either sees [1] or [1]-->[2]. If it sees [1], the load of n will return 0, and read(x) will be performed. In the other case, read(x) will not be performed. The load of m in thread2 ([4]) must see [1]-->[2] due to the fence. Thus, the load of m will return 1 and free(x) will never be performed. Because read(x) is 'maybe' done and free(x) is never done, the pogram is safe!
S = [2] --> [1]: The load of n in thread1([3]) must see [2]-->[1] due to the fence. Thus, the load of n will return 1 and read(x) will not be performed. The load of m in thread2 ([4]) either sees [2] or [2]-->[1]. If it sees [2], the load of m will return 0 and free(x) will be performed. In the other case, free(x) will not be performed. In conclusion, read(x) will never be performed so safety is guaranteed regardless of whether free(x) is called. Safe!
So in conclusion, fence(SeqCst) imposes some condition for load(SeqCst) and limits how it 'sees' S.

Connecting the Dots

I think it is also important to think why the usage of mfence provides this high-level conditions that I described. Let's see the code before in pseudo-assembly.

thread1:
    mov rax, 1
    xchg rax, [rbx] // atomic store [1]
    mfence
    mov rax, [rcx] // atomic load [3]

thread2:
    mov rax, 1
    xchg rax, [rcx] // atomic store [2]
    mfence
    mov rax, [rbx] // atomic load [4]

The order between the two xchgs can be arbitrary. But, by mfence, the loads are done strictly after xchg commits to memory. Thus, because there exists an order between xchgs PLUS an order between xchg and load, the high-level properties described previously are obtained.

Conclusions and Thoughts

If any of my discussions are wrong, please tell me where my thoughts are wrong. These are rather unbased thoughts of mine that were created during the process of attempting to understand what fences are.

Dec 03 '22 15:12 pr0cf5

cs431 cs431 copied to clipboard

[Question] 'Strength' comparision of SeqCst fence and SeqCst load/stores

Opening

Low-Level Perspective

High-Level Perspective

Connecting the Dots

Conclusions and Thoughts

cs431
cs431 copied to clipboard