Pak Markthub
Pak Markthub
Hi @anaanimous, CPU and GPU clocks are usually the main cause (but not always) of performance fluctuation. Can you try the items below and rerun your test again? 1. Fix...
Let's split into two topics here. The first one is the performance fluctuation, which seems to be resolved now. Depending on how your instance is allocated, I guess that you...
Hi @Zhaojp-Frank , I would like to know more about your setup before we drive deeper. Some questions are just to make sure that we have already eliminated external factors....
GDRCopy, by design, is for low latency CPU-GPU communication at small message sizes. It uses CPU to drive the communication -- as opposed to `cudaMemcpy` which uses the GPU copy...
WC mapping is enabled in the gdrdrv driver. You can comment out these lines to disable it (https://github.com/NVIDIA/gdrcopy/blob/master/src/gdrdrv/gdrdrv.c#L1190-L1197). The default on x86 should be uncached (UC) mapping. You probably see...
Have you already measured the BW? If you are limited by the BW, there is nothing much we can do. As mentioned, the peak BW GDRCopy can achieved can be...
Hi @goooxu, We don't have an official document about installing GDRCopy in Docker. But I can provide some guideline here. GDRCopy composes of two important modules: 1) gdrdrv driver, and...
Thank you, @JieRen98. However, this does not solve the fundamental issue. On some clusters, the current directory is immutable inside compute nodes (i.e., it is read only). Introducing a variable...
Hi @tangrc99, Neither `nvidia-peermem` nor `nv_peer_mem` involves in dmabuf. A10 should support dmabuf. Could you check if your SW stack is new enough to support dmabuf? - NVIDIA driver with...
Hi @YoonGi-AWS, Thank you for reporting this issue. The fix is under review.