Aditya K Kamath
Aditya K Kamath
This data race manifested when I ran the louvain application in the example folder with the provided inputs.
This data race manifested when I ran the sm application in the example folder with the provided inputs.
Yes, replacing these with relaxed/release/acquire seems like a good choice. You're completely correct about atomicCAS, it completely erodes the point of the barrier. Thank you for the atomicAdd recommendation.
Hi Hundan, Thanks for the feedback. The source files in the GPMBench folder call the library interface, for example DNN (https://github.com/csl-iisc/GPM-ASPLOS22/blob/master/GPMBench/checkpointing/DNN/src/lenet.cu). If you search for GPM on that page, you...
Sure. Github doesn't allow .py files so I've attached it as a .txt here: [allgather.txt](https://github.com/sstsimulator/sst-elements/files/11773527/allgather.txt)
This seems repeated in many locations. For example src/concurrent_set/cset_warp_operations.cuh has a similar code snippet at lines 41 - 43.
Since the read operation is "weak" it has no consistency guarantees. While I don't see any obvious race conditions, there has been [prior work](http://www0.cs.ucl.ac.uk/staff/j.alglave/papers/asplos15.pdf) that shows data races in CUDA...
So by definition (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#conflict-and-data-races), there is a data race between the weak load and atomic. I guess it could be a benign data race in this case.
Hi. Thanks for the interest. We found the bandwidth across the PCIe between the GPU and PM scaled almost linearly with the number of channels used by NVM. That is,...
I believe they should. It's important to note that with 6 DIMMs in one socket, you can turn PM interleaving on in order to stripe the persistent file across the...