SlabHash icon indicating copy to clipboard operation
SlabHash copied to clipboard

Race during insertion of key/values

Open AKKamath opened this issue 3 years ago • 8 comments

I noticed that there were races in the code. For example, in line 45 of src/concurrent_map/warp:

uint32_t src_unit_data = (next == SlabHashT::A_INDEX_POINTER) ? *(getPointerFromBucket(src_bucket, laneId)) : *(getPointerFromSlab(next, laneId));

Here a weak load operation is used to read data from the bucket. Later on (line 65/line 83), an atomicCAS is performed to the same location.

Different threads may read and atomicCAS on the same location, creating a race.

AKKamath avatar Aug 11 '21 20:08 AKKamath

@sashkiani @maawad

jowens avatar Aug 24 '21 12:08 jowens

@AKKamath did you see other races of this type? Thank you for this report!

jowens avatar Aug 24 '21 12:08 jowens

This seems repeated in many locations. For example src/concurrent_set/cset_warp_operations.cuh has a similar code snippet at lines 41 - 43.

AKKamath avatar Aug 24 '21 13:08 AKKamath

Hi @AKKamath, although the bucket read operation doesn't perform any synchronization, the atomicCAS operation (where we perform the write) will only succeed if the pointer points to an EMPTY_PAIR_64 (i.e., it will fail if we read an empty value, but before the atomicCAS operation a different thread modified that memory location). If different threads try to perform a CAS operation, only one thread will succeed.

Could you explain what scenario would produce a race condition?

I'm also curious about how you found these multiple race conditions.

maawad avatar Aug 24 '21 13:08 maawad

Since the read operation is "weak" it has no consistency guarantees. While I don't see any obvious race conditions, there has been prior work that shows data races in CUDA code breaks assumptions. In the same work, they change a weak store to an atomic operation as it was racing with an atomicCAS.

A simple fix would be changing the weak loads to a "strong" operation by typecasting them to volatile, or using an atomic operation instead, e.g. atomicAdd(getPointerFromBucket(src_bucket, laneId), 0)

AKKamath avatar Aug 24 '21 14:08 AKKamath

Thanks for sharing that paper. I read it before. But as I mentioned, even though the read operation is weak, the operation that matters is the atomic CAS operation.

maawad avatar Aug 24 '21 14:08 maawad

So by definition (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#conflict-and-data-races), there is a data race between the weak load and atomic. I guess it could be a benign data race in this case.

AKKamath avatar Aug 24 '21 15:08 AKKamath

I continue to be interested in getting to the bottom of this. @maawad is there a data race / is it benign, or is this not a race?

jowens avatar Aug 28 '21 18:08 jowens