Davide Rossetti

Results 17 issues of Davide Rossetti

optimized memcpy implementations should be chosen at run-time during a tuning phase, possibly in gdr_open()

enhancement
help wanted

the problem is in run_iter_bw_infinitely(), where the call to pthread_sigmask happens too late, when the CUDA driver has been initialized and its worker thread launched. the solution is to move...

see https://github.com/linux-rdma/perftest/blob/6369e620429197f7cc0b6bfcb9734fe70f0b92f0/src/perftest_resources.c#L4222

perftest should support benchmarking of these new kinds of memory. There are two basic variants of CUDA Unified Memory: 1. managed memory, as allocated via cudaMallocManaged() 2. system allocated memory,...

send_lat could easily support CUDA device memory as source/sink. write_lat cannot do the same as easily, as it relies on direct memory polling from the CPU.

This is bad, as people are convinced to be running a build with GPU support, so for example they report unreasonable performance.