ucc
ucc copied to clipboard
NCCL backend from UCC installed in NVHPCSDK
Hello,
is the nccl backend of UCC available in the hpcx-mpi installation from nvhpcsdk?
The TL is available according to ucc_info; I load the libraries with
module load /leonardo/prod/opt/compilers/nvhpc/25.3/binary/modulefiles/nvhpc-hpcx-cuda12/25.3
source /leonardo/prod/opt/compilers/nvhpc/25.3/binary/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/hpcx-init.sh
hpcx_load
This installation uses UCC/1.4.3. I checked the availability of the TL with
- ucc_info -s
Loading /leonardo/prod/opt/compilers/nvhpc/25.3/binary/modulefiles/nvhpc-hpcx-cuda12/25.3
Loading requirement: hpcx
Default CLs scores: basic=10 hier=50
Default TLs scores: cuda=40 mlx5=1 nccl=20 self=50 sharp=30 shm=100 ucp=10
- ucc_info -b | grep "nccl"
#define UCC_CONFIGURE_FLAGS "--with-ucx=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucx --with -sharp=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/sharp --with-rdmacm --with-tlcp=alltoall_block -- with-cuda=/hpc/local/oss/cuda12.6.3/redhat8 --with-nccl --with-tls=cuda,nccl,self,sharp,shm,ucp,mlx5 --prefix=/build-re sult/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucc"
At runtime I set
export OMPI_MCA_coll_ucc_enable=1
export OMPI_MCA_coll_ucc_priority=100
export UCC_TL_NCCL_TUNE=allreduce:cuda:inf
But the TL for allreduce is not changed. From --mca coll_ucc_verbose I get always UCP as TL for cuda memory kind:
[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203 UCC INFO Allreduce:
[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203 UCC INFO Host: {0..4095}:TL_SHM:10 {4K..8K}:TL_SHM:10 {8193..inf}:TL_UCP:10
[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203 UCC INFO Cuda: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203 UCC INFO CudaManaged: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
I can report some failures in the initialization part, related to cuda TL:
[1766412746.460863] [lrdn0259:811666:0] mc_cuda.c:78 cuda mc DEBUG cuCtxGetDevice() failed: invalid device context
...
[1766412746.461583] [lrdn0259:811667:0] tl_cuda_context.c:43 TL_CUDA DEBUG cannot create CUDA TL context without active CUDA context
[1766412746.461589] [lrdn0259:811667:0] ucc_context.c:412 UCC DEBUG failed to create tl context for cuda
Could you please give me more information about the error? Should I expect this to be related to the unavailability of nccl tl?
Thank you for your time,
Laura