ucx
ucx copied to clipboard
UCT/CUDA_COPY: detect device transfers and report peak arch bandwidth
What
Detect if remote/local memory types for perf estimate is of type cuda/cuda-managed. If so, report peak device memory bandwidth
Why ?
Preparation for device staging pipeline protocols. Without this patch, only estimated peak host<->cuda bandwidth is reported which may not allow for device bounce buffer selection.
cc @yosefe @bureddy