sylvan
sylvan copied to clipboard
Concurrency Correctness (Cache)
Shouldn‘t it be memory_order_acquire
here:
https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L115 https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L176
I‘m not an expert about the C/C++ memory model, but as far as I understand it, the compiler or CPU is free to reorder the instructions roughly like this (i.e. move the load instructions of the bucket values before the load of the status):
uint64_t bucket_a = bucket->a, bucket_b = bucket->b, bucket_c = bucket->c;
const uint32_t s = atomic_load_explicit(s_bucket, memory_order_relaxed);
// abort if locked or if part of a 2-part cache entry
if (s & 0xc0000000) return 0;
// abort if different hash
if ((s ^ (hash>>32)) & 0x3fff0000) return 0;
// abort if key different
if (bucket_a != a || bucket_b != b || bucket_c != c) return 0;
*res = bucket->res;
// abort if status field changed after compiler_barrier()
return atomic_load_explicit(s_bucket, memory_order_acquire) == s ? 1 : 0;
memory_order_acquire
guarantees that “no reads or writes in the current thread can be reordered before this load,” which would fix the problem. In return one could probably change the memory_order_acquire
of the second atomic load into memory_order_relaxed
, at least I don’t see any need for memory_order_acquire
here.
Also, atomic_compare_exchange_weak
in the put
functions could probably be replaced by atomic_compare_exchange_weak_explicit
with memory_order_acq_rel
for the success case and memory_order_relaxed
for the fail case. This may have no effect on the generated assembly for strongly-ordered architectures such as x86-64 but may have a performance impact on others. Furthermore, it would be great if the compiler_barrier()
macro and comments could be removed, they are somewhat confusing when reading the code for the first time.
In addition, the loads and stores here (and similarly for cache_get6()
and cache_put6()
) need to be atomic operations:
https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L182-L183 https://github.com/trolando/sylvan/blob/0278500f05f44d3129cf2f5e60a2f25c255c0204/src/sylvan_cache.c#L209-L212
Section 5.1.2.4 of the C23 language specification draft states that if “there is ambiguity about which side effect to a non-atomic object is visible, then there is a data race and the behavior is undefined.” Assume that thread T1 executes cache_get()
up to the point immediately before the loads cited above (L182). Then thread T2 executes cache_put()
for the same bucket up to the point immediately before the stores cited above (L209). Now, there is no proper happens-before relationship between the cited loads and stores. Therefore, the values visible to T1 are ambiguous and we have undefined behavior. As I read the language specification, it does not matter that this ambiguity is finally “resolved” by checking the status on line 185 in cache_get()
.