Davide Rossetti
Davide Rossetti
@maddyscientist that is a good question. I am not expecting a dependency on the buffer size, but I might be wrong.
can you copy the crash dump here?
@blizard-sis how does gdrcopy break for you?
@pakmarkthub this might be fun :)
@hongbilu any performance model would be HW dependent inherently, so it would involve maintaining a database of FOMs for each platform. That is why I was proposing a run-time autotuning...
> It appears that the utilization doesn't reach its maximal possible value, getting about 20 GB/s out of the possible 32 GB/s, for buffers of sizes 32kB-8MB. This question has...
confirming that it works, provided that the allocation has the gpudirect rdma flag set.
@tangrc99 this expected as the implementation of ibv_reg_mr in the Linux kernel requires the virtual address range to be backed by CPU memory pages. More exactly, pin_user_pages does not work...
It should. Are you using the openrm variant of the GPU kernel-mode driver, see https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/ ?
In that case you can use the legacy RDMA memory registration path, i.e. `ibv_reg_mr`, which involves the peer-direct kernel infrastructure (for example provided by MLNX_OFED) and `nvidia-peermem`.