Marko Kabić

Results 42 comments of Marko Kabić

bors try-

bors try-

bors try-

The NCCL backend requires that 1 rank : 1 gpu device is used, even if CRAY_CUDA_MPS=1. Could it be the reason of the failing tests? Also, `cudaSetDevice(rank)` should be called...

Thanks Ole for checking. This seems to be the limitation of nccl, even if MPS is enabled, as stated here: https://github.com/NVIDIA/nccl/issues/418, more concretely in [this comment](https://github.com/NVIDIA/nccl/issues/418#issuecomment-725537187): >This is not allowed...

I see the problem, although I still think that cp2k+cosma+nccl could outperform cp2k+cosma, at least on multi-gpu per node architectures, especially for gemm-dominant simulations. Another option is to use gpu-aware...