Davide Rossetti comments

Results 32 comments of


                                            Davide Rossetti

add autotuning support

@maddyscientist that is a good question. I am not expecting a dependency on the buffer size, but I might be wrong.

ib_write_bw --cuda will lead to system deallock

can you copy the crash dump here?

dependenses in gdrcopy.spec

@blizard-sis how does gdrcopy break for you?

benchmark and eventually support MOVDIRI/MOVDIR64B on Sapphire Rapids Xeon

@pakmarkthub this might be fun :)

add autotuning support

@hongbilu any performance model would be HW dependent inherently, so it would involve maintaining a database of FOMs for each platform. That is why I was proposing a run-time autotuning...

Increasing utilization - gdrcopy_copybw

> It appears that the utilization doesn't reach its maximal possible value, getting about 20 GB/s out of the possible 32 GB/s, for buffers of sizes 32kB-8MB. This question has...

Does gdrcopy work with CUDA Virtual Memory Management APIs?

confirming that it works, provided that the allocation has the gpudirect rdma flag set.

call ibv_reg_mr failed using mapped memory

@tangrc99 this expected as the implementation of ibv_reg_mr in the Linux kernel requires the virtual address range to be backed by CPU memory pages. More exactly, pin_user_pages does not work...

call ibv_reg_mr failed using mapped memory

It should. Are you using the openrm variant of the GPU kernel-mode driver, see https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/ ?

call ibv_reg_mr failed using mapped memory

In that case you can use the legacy RDMA memory registration path, i.e. `ibv_reg_mr`, which involves the peer-direct kernel infrastructure (for example provided by MLNX_OFED) and `nvidia-peermem`.