njw1123
njw1123
env ``` NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 ``` install ``` ./autogen.sh ./contrib/configure-release --prefix=/opt/ucx --enable-shared --disable-static --disable-doxygen-doc --enable-optimizations --enable-cma --enable-devel-headers --with-cuda=/usr/local/cuda --with-verbs --with-dm --enable-mt make -j 8 ```...
My setup is a single server with 8 H20 GPUs connected via NVLink (NV18 topology). Each link provides about 26 GB/s, so the theoretical aggregate bandwidth is around 400 GB/s....
When performing cross-node GPU communication, will UCX automatically choose the GPUDirect Async style of communication, or will it at most use only the GPUDirect RDMA type of communication?