c11 atomics: Incorrect handling of boundary condition
At the end of loop, there is a possible sequence which can result myValue equal to hisRealValue but it is still a legal sequence.
// myId is current thread Id and hisId is it's partner thread Id
// myValue starts with zero
do{
myValue++;
atomic_store_explicit(&destMemory[myId], myValue); // scope and memory order doesn't matter
fence; // type of fence doesn't matter
hisValue = atomic_load_explicit(&destMemory[hisId]);
} while(myValue == hisValue && myValue < 500000);
oldValues[myId] = hisValue;
The issue actually shows up when the two threads reading each other's values reach 499999, and now at the last iteration, one thread completes its operations and exits the loop. Now the next thread starts its operations, completes the do-while block, and will exit the loop. Below is dry run of the sequence for the both threads
//thread 1: thread 2: both have myValue as 499999
myValue++; thread 1: 500000
atomic_store_explicit(myValue) // thread 1: stores value of 500000
fence // thread 1: let others know you had new changes
hisValue = atomic_load_explicit(...) // thread 1: reads other thread value 499999
// thread 1: exit the loop since both conditions failed
myValue++; thread 2: 500000
atomic_store_explicit(myValue) // thread 2: stores value of 500000
fence // thread 2: let others know you had new changes
hisValue = atomic_load_explicit(...) // thread 2: reads other thread value 500000
// thread 2: exit the loop because myValue !< 500000
// if this is not at boundary we would have looped one more time in case of thread2
After the above execution sequence, below are the values we see while verifying the thread 1 buffers
myValue=500000
hisValue=499999
hisReadValue=500000
myValReadByHim=500000
Since myValue is equal to myValReadByHim, we are resulting in an error that says the fence is not working and both threads read a stale value. That is correct, but not at the boundary condition; here we haven't read any stale value, and the fence is working correctly. The only problem is thread 2 forced out of the loop because myValue is not less than 500000; if it's not at the boundary, it would have taken one more iteration, and the values after this iteration would not have hit this condition.
This fix makes the condition more robust while checking if two threads have read stale value. If both threads read stale values, they would read the same old values, so checking whether hisValue == myValReadByHim will make sure both threads read the stale value even at the boundaries. Now with both conditions we can be sure that both threads have read stale values.
Raised the pull request for the same: https://github.com/KhronosGroup/OpenCL-CTS/pull/2485
Sporadic failures were observed on AMD and most likely to fail on other vendors as well, all occurring at the boundary (500000). Fences appear to be functioning correctly, and no thread seems to be reading stale values.
To reproduce the failure, run the test 500 to 1000 times in loop.
Per discussion on 9th Sept OpenCL Call, @kpet pointed out it might be same as https://www.khronos.org/members/login/bugzilla/show_bug.cgi?id=15814 Just documenting that for future reference.
For reproducing the Issue we can run the following steps on any android device, similar can be done on other OS also:
- adb root
- adb remount
- adb shell "setenforce 0"
- adb push test_c11_atomics /data/local/tmp/
- adb shell "chmod -R 777 /data/local/tmp/test_c11_atomics"
- adb shell "./data/local/tmp/test_c11_atomics atomic_fence" --> Run the atomic_fence cts test case
There are many subtest cases when we run test_c11_atomics the issue sub cases are seq_cst fence, global atomic_*, scope_device case
seq_cst fence, global atomic_int, scope_device seq_cst fence, global atomic_uint, scope_device seq_cst fence, global atomic_long, scope_device seq_cst fence, global atomic_ulong, scope_device seq_cst fence, global atomic_intptr_t, scope_device seq_cst fence, global atomic_uintptr_t, scope_device seq_cst fence, global atomic_size_t, scope_device seq_cst fence, global atomic_ptrdiff_t, scope_device
Attached a simple shell script through which we can run this test 100 times on a linux machine ` #!/bin/bash
Log file where all outputs will be saved
LOG_FILE="output.log"
Clear previous log file if it exists
"$LOG_FILE"
echo "Starting 100 runs of test_c11_atomics atomic_fence" echo "Output will be saved to: $LOG_FILE" echo "----------------------------------------" adb root adb remount adb shell "setenforce 0" adb push test_c11_atomics /data/local/tmp/ adb shell "chmod -R 777 /data/local/tmp/test_c11_atomics"
Loop 100 times
for i in {1..100}; do echo "Run $i started at $(date)" | tee -a "$LOG_FILE" adb shell "./data/local/tmp/test_c11_atomics atomic_fence" 2>&1 | tee -a "$LOG_FILE" echo "Run $i completed." | tee -a "$LOG_FILE" echo "----------------------------------------" | tee -a "$LOG_FILE" done
echo "All runs completed. Output saved to: $LOG_FILE"
`
Attached the error log which we got
CTS verification condition: https://github.com/KhronosGroup/OpenCL-CTS/blob/2e0f8036990e209f216a1def553cfce1b8a874dd/test_conformance/c11_atomics/test_atomics.cpp#L3131