Davide Rossetti

Results 32 comments of Davide Rossetti

@maddyscientist that is a good question. I am not expecting a dependency on the buffer size, but I might be wrong.

can you copy the crash dump here?

@blizard-sis how does gdrcopy break for you?

@hongbilu any performance model would be HW dependent inherently, so it would involve maintaining a database of FOMs for each platform. That is why I was proposing a run-time autotuning...

> It appears that the utilization doesn't reach its maximal possible value, getting about 20 GB/s out of the possible 32 GB/s, for buffers of sizes 32kB-8MB. This question has...

confirming that it works, provided that the allocation has the gpudirect rdma flag set.

@tangrc99 this expected as the implementation of ibv_reg_mr in the Linux kernel requires the virtual address range to be backed by CPU memory pages. More exactly, pin_user_pages does not work...

It should. Are you using the openrm variant of the GPU kernel-mode driver, see https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/ ?

In that case you can use the legacy RDMA memory registration path, i.e. `ibv_reg_mr`, which involves the peer-direct kernel infrastructure (for example provided by MLNX_OFED) and `nvidia-peermem`.