Fix incorrect verification at boundary
At end of loop there is a possible sequence which can result myValue equal to hisRealValue but it is still a legal sequence.
Added check to handle missed sequence.
// myValue starts with zero
do{
myValue++;
atomic_store_explicit(&destMemory[myId], myValue); // scope and memory order doesn't matter
fence; // type of fence doesn't matter
hisValue = atomic_load_explicit(&destMemory[hisId]);
} while(myValue == hisValue && myValue < 500000);
oldValues[myId] = hisValue;
The issue actually shows up when the two threads reading each other's values reach 499999, and now at the last iteration, one thread completes its operations and exits the loop. Now the next thread starts its operations, completes the do-while block, and will exit the loop. Below is dry run of the sequence for the both threads
//thread 1: thread 2: both have myValue as 499999
myValue++; thread 1: 500000
atomic_store_explicit(myValue) // thread 1: stores value of 500000
fence // thread 1: let others know you had new changes
hisValue = atomic_load_explicit(...) // thread 1: reads other thread value 499999
// thread 1: exit the loop since both conditions failed
myValue++; thread 2: 500000
atomic_store_explicit(myValue) // thread 2: stores value of 500000
fence // thread 2: let others know you had new changes
hisValue = atomic_load_explicit(...) // thread 2: reads other thread value 500000
// thread 2: exit the loop because myValue !< 500000
// if this is not at boundary we would have looped one more time in case of thread2
After the above execution sequence, below are the values we see while verifying the thread 1 buffers
myValue=500000
hisValue=499999
hisReadValue=500000
myValReadByHim=500000
Since myValue is equal to myValReadByHim, we are resulting in an error that says the fence is not working and both threads read a stale value. That is correct, but not at the boundary condition; here we haven't read any stale value, and the fence is working correctly. The only problem is thread 2 forced out of the loop because myValue is not less than 500000; if it's not at the boundary, it would have taken one more iteration, and the values after this iteration would not have hit this condition.
This fix makes the condition more robust. If both threads read stale values, they would read the same old values, so checking whether hisValue == myValReadByHim will make sure both threads read the stale value even at the boundaries. Now with both conditions we can be sure that both threads have read stale values.
Find the Issue discussion here: https://github.com/KhronosGroup/OpenCL-CTS/issues/2496