gdrcopy
gdrcopy copied to clipboard
add a producer-consumer benchmark
strawman design:
- allocate device memory buffer B
- launch CUDA kernel:
- polling on B[0]
- writing a zero-copy flag
- CPU:
- wait for the kernel to really be polling
- read tsc in t_start
- write B[0]
- wait for flag
- read tsc in t_end
- d_t = t_end - t_start should be lower than 1-2 msecs
- repeat until result is stable
@pakmarkthub regarding #101, note that this issue is really about a different kind of latency test, i.e. not a copy latency. it is meant to measure the latency of writing to device memory through a GDRcopy mapping.