ucc icon indicating copy to clipboard operation
ucc copied to clipboard

NCCL backend from UCC installed in NVHPCSDK

Open bellenlau opened this issue 1 week ago • 0 comments

Hello,

is the nccl backend of UCC available in the hpcx-mpi installation from nvhpcsdk?

The TL is available according to ucc_info; I load the libraries with

module load /leonardo/prod/opt/compilers/nvhpc/25.3/binary/modulefiles/nvhpc-hpcx-cuda12/25.3
source /leonardo/prod/opt/compilers/nvhpc/25.3/binary/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/hpcx-init.sh
hpcx_load

This installation uses UCC/1.4.3. I checked the availability of the TL with

  • ucc_info -s
Loading /leonardo/prod/opt/compilers/nvhpc/25.3/binary/modulefiles/nvhpc-hpcx-cuda12/25.3
  Loading requirement: hpcx
Default CLs scores: basic=10 hier=50
Default TLs scores: cuda=40 mlx5=1 nccl=20 self=50 sharp=30 shm=100 ucp=10
  • ucc_info -b | grep "nccl"
#define UCC_CONFIGURE_FLAGS       "--with-ucx=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucx --with             -sharp=/build-result/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/sharp --with-rdmacm --with-tlcp=alltoall_block --             with-cuda=/hpc/local/oss/cuda12.6.3/redhat8 --with-nccl --with-tls=cuda,nccl,self,sharp,shm,ucp,mlx5 --prefix=/build-re             sult/hpcx-v2.22.1-gcc-doca_ofed-redhat8-cuda12-x86_64/ucc"

At runtime I set

export OMPI_MCA_coll_ucc_enable=1
export OMPI_MCA_coll_ucc_priority=100
export UCC_TL_NCCL_TUNE=allreduce:cuda:inf

But the TL for allreduce is not changed. From --mca coll_ucc_verbose I get always UCP as TL for cuda memory kind:

[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203  UCC  INFO  Allreduce:
[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203  UCC  INFO     Host: {0..4095}:TL_SHM:10 {4K..8K}:TL_SHM:10 {8193..inf}:TL_UCP:10
[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203  UCC  INFO     Cuda: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
[1766411628.231717] [lrdn1487:319887:0] ucc_coll_score_map.c:203  UCC  INFO     CudaManaged: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10

I can report some failures in the initialization part, related to cuda TL:

[1766412746.460863] [lrdn0259:811666:0]         mc_cuda.c:78   cuda mc DEBUG cuCtxGetDevice() failed: invalid device context
...
[1766412746.461583] [lrdn0259:811667:0] tl_cuda_context.c:43   TL_CUDA DEBUG cannot create CUDA TL context without active CUDA context
[1766412746.461589] [lrdn0259:811667:0]     ucc_context.c:412  UCC  DEBUG failed to create tl context for cuda

Could you please give me more information about the error? Should I expect this to be related to the unavailability of nccl tl?

Thank you for your time,

Laura

bellenlau avatar Dec 22 '25 14:12 bellenlau