ucx icon indicating copy to clipboard operation
ucx copied to clipboard

UCT/CUDA: remove cuda_runtime dependency

Open Akshay-Venkatesh opened this issue 3 years ago • 22 comments

What

Removes uct/cuda dependency on cuda runtime

Why ?

  • generally a minimum cuda driver version covers all functionality that cuda_runtime provides so additional dependency not needed

TODO

  • need to check if this dependency can be removed from memory interception layer as well
    • if all runtime memory calls necessarily go through driver API, it should be possible to remove ucm dependency on cudart as well

Akshay-Venkatesh avatar Jan 11 '22 18:01 Akshay-Venkatesh

cc @yosefe @bureddy @jirikraus

Akshay-Venkatesh avatar Jan 11 '22 18:01 Akshay-Venkatesh

per offline discussion, need to also remove it from build (link) otherwise ok

@yosefe the following changes to remove cudart from build causes gtest build to fail as it depends on cudart for cudaMalloc/cudaFree calls, and also depends on cudart_static for static hook tests.

diff --git a/config/m4/cuda.m4 b/config/m4/cuda.m4
index bd3308765..31bf3b7c7 100644
--- a/config/m4/cuda.m4
+++ b/config/m4/cuda.m4
@@ -48,9 +48,6 @@ AS_IF([test "x$cuda_checked" != "xyes"],
          AS_IF([test "x$cuda_happy" = "xyes"],
                [AC_CHECK_LIB([cuda], [cuDeviceGetUuid],
                              [CUDA_LIBS="$CUDA_LIBS -lcuda"], [cuda_happy="no"])])
-         AS_IF([test "x$cuda_happy" = "xyes"],
-               [AC_CHECK_LIB([cudart], [cudaGetDeviceCount],
-                             [CUDA_LIBS="$CUDA_LIBS -lcudart"], [cuda_happy="no"])])

          # Check nvml header files
          AC_CHECK_HEADERS([nvml.h],
@@ -68,15 +65,6 @@ AS_IF([test "x$cuda_checked" != "xyes"],
                               cuda_happy="no"])])

          LDFLAGS="$save_LDFLAGS"
-
-         # Check for cuda static library
-         have_cuda_static="no"
-         AS_IF([test "x$cuda_happy" = "xyes"],
-               [AC_CHECK_LIB([cudart_static], [cudaGetDeviceCount],
-                             [CUDA_STATIC_LIBS="$CUDA_STATIC_LIBS -lcudart_static"
-                              have_cuda_static="yes"],
-                             [], [-ldl -lrt -lpthread])])
-
          CPPFLAGS="$save_CPPFLAGS"
          LDFLAGS="$save_LDFLAGS"
          LIBS="$save_LIBS"

Before this PR, ldd libuct_cuda.so looks as follows:

$ ldd lib/ucx/libuct_cuda.so
        linux-vdso.so.1 (0x00007ffcc7ff2000)
        libucs.so.0 => $UCX_HOME/lib/libucs.so.0 (0x00007feb00edd000)
        libuct.so.0 => $UCX_HOME/lib/libuct.so.0 (0x00007feb00c81000)
        libcuda.so.1 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcuda.so.1 (0x00007feaff55f000)
        libcudart.so.11.0 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcudart.so.11.0 (0x00007feaff2c6000)
        libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007feafec42000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007feafea23000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007feafe632000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007feafe42e000)
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007feafe223000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007feafde85000)
        libucm.so.0 => $UCX_HOME/lib/libucm.so.0 (0x00007feafdc60000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007feafda58000)
        /lib64/ld-linux-x86-64.so.2 (0x00007feb0137e000)

and after this PR, it looks as follows:

$ ldd lib/ucx/libuct_cuda.so
        linux-vdso.so.1 (0x00007ffc0d48a000)
        libucs.so.0 => $UCX_HOME/lib/libucs.so.0 (0x00007fd62246c000)
        libuct.so.0 => $UCX_HOME/lib/libuct.so.0 (0x00007fd622210000)
        libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fd620b28000)
        libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007fd6204a4000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd620285000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd61fe94000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd61fc90000)
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fd61fa85000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd61f6e7000)
        libucm.so.0 => $UCX_HOME/lib/libucm.so.0 (0x00007fd61f4c2000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd61f2ba000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fd62290c000)

Do we really need to remove anything?

Akshay-Venkatesh avatar Feb 01 '22 16:02 Akshay-Venkatesh

can we remove cudart only from UCT&UCM. but not from gtest?

yosefe avatar Feb 01 '22 16:02 yosefe

can we remove cudart only from UCT&UCM. but not from gtest?

@yosefe with this PR, that's already the case as UCT/UCM no longer depends on cudart:

$ ldd lib/ucx/libucm_cuda.so
        linux-vdso.so.1 (0x00007fff6f472000)
        libucm.so.0 => $UCX_HOME/lib/libucm.so.0 (0x00007f21c5ead000)
        libcuda.so.1 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcuda.so.1 (0x00007f21c478b000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f21c456c000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f21c417b000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f21c3f77000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f21c3bd9000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f21c39d1000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f21c62da000)

$ ldd lib/ucx/libuct_cuda.so
        linux-vdso.so.1 (0x00007fffec1db000)
        libucs.so.0 => $UCX_HOME/lib/libucs.so.0 (0x00007fd62cbd5000)
        libuct.so.0 => $UCX_HOME/lib/libuct.so.0 (0x00007fd62c979000)
        libcuda.so.1 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcuda.so.1 (0x00007fd62b257000)
        libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007fd62abd3000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd62a9b4000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd62a5c3000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd62a3bf000)
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fd62a1b4000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd629e16000)
        libucm.so.0 => $UCX_HOME/lib/libucm.so.0 (0x00007fd629bf1000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd6299e9000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fd62d075000)

$ ldd test/gtest/gtest 
        linux-vdso.so.1 (0x00007ffd8b308000)
       ...
        libcuda.so.1 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcuda.so.1 (0x00007f56b3283000)
        libcudart.so.11.0 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcudart.so.11.0 (0x00007f56b2fea000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f56b2dcb000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f56b2a42000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f56b26a4000)
        libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f56b2475000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f56b225d000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f56b1e6c000)
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f56b1c61000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f56b1a59000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f56b6fd1000)
        libnl-route-3.so.200 => /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200 (0x00007f56b17e4000)
        libnl-3.so.200 => /lib/x86_64-linux-gnu/libnl-3.so.200 (0x00007f56b15c4000)
        libmlx5.so.1 => /usr/lib/x86_64-linux-gnu/libmlx5.so.1 (0x00007f56b136c000)

Are there further changes needed? I most likely misunderstood your question.

Akshay-Venkatesh avatar Feb 01 '22 17:02 Akshay-Venkatesh

@Akshay-Venkatesh maybe this dependency is removed by the linker because we don't call it. But i think it would be good to remove it from Makefile as well - separate CUDA_LIBS to CUDA_LIBS and CUDART_LIBS

yosefe avatar Feb 01 '22 18:02 yosefe

@petro-rudenko How do I know which compilation flags were used to build goperftest? I'm not sure if it was built with -DHAVE_CUDA or -DHAVE_CUDART here https://dev.azure.com/ucfconsort/ucx/_build/results?buildId=36177&view=logs&j=3326af28-725b-5a76-d9b2-a6afcb2c442d&t=325b69bb-50d4-51fd-759e-eb1ff0fb9743&l=370 I'm trying to figure what compilation flags were used to build the ucx version that was queried in functions like this as well as it looks like memtypesMask doesn't have CUDA for the failing test:

// This routine fetches information about the context.
func (c *UcpContext) Query(attrs ...UcpContextAttr) (*C.ucp_context_attr_t, error) {
        var ucp_attrs C.ucp_context_attr_t

        for _, attr := range attrs {
                ucp_attrs.field_mask |= C.ulong(attr)
        }

        if status := C.ucp_context_query(c.context, &ucp_attrs); status != C.UCS_OK {
                return nil, newUcxError(status)
        }

        return &ucp_attrs, nil
}

Akshay-Venkatesh avatar Feb 02 '22 20:02 Akshay-Venkatesh

Hi @Akshay-Venkatesh Go dynamically links only to ucp and ucs. SInce it doesn't use cuda API directly - only through ucp_mem_map, etc:

https://github.com/openucx/ucx/blob/master/bindings/go/Makefile.am#L10-L11

petro-rudenko avatar Feb 03 '22 09:02 petro-rudenko

So probably you would need to add to that file something like this:

if HAVE_CUDART
CGOCFLAGS=$(CGOCFLAGS) $(CUDART_CPPFLAGS)
CGOLDFLAGS=$(CGOLDFLAGS) $(CUDART_LDFLAGS)
UCX_SOPATH=$(UCX_SOPATH) $(CUDART_LIBS) -l $(top_builddir)/src/uct/cuda/libuct_cuda.la
endif

petro-rudenko avatar Feb 03 '22 09:02 petro-rudenko

Hi @Akshay-Venkatesh Go dynamically links only to ucp and ucs. SInce it doesn't use cuda API directly - only through ucp_mem_map, etc:

https://github.com/openucx/ucx/blob/master/bindings/go/Makefile.am#L10-L11

Hi @petro-rudenko. Thanks for the info. I'm probably missing something but if the go test is checking perf with cuda memory, and if go dynamically links to ucx libraries already compiled with cuda enabled (so HAVE_CUDA and HAVE_CUDART set for appropriate compilation units), I'm not sure why memorytypesmask doesn't have CUDA memory in it. These tests were passing before so the failures are coming from changes in this PR. I'm probably missing some changes where HAVE_CUDA needs to be changed to have CUDART. I'll look into it.

BTW, I tried the change you suggested just to see if build passes but it seems like if we went with this approach, we'd have to also add this too right? (in case some other part depends on libcuda symbols)

if HAVE_CUDA
...
endif

Akshay-Venkatesh avatar Feb 03 '22 18:02 Akshay-Venkatesh

$UCX_TLS=cuda UCX_LOG_LEVEL=trace LD_LIBRARY_PATH=/hpc/local/oss/gdrcopy2.3_cuda11.4/lib:/hpc/local/oss/cuda11.4/lib64:/hpc/local/oss/cuda11.4/lib64/stubs:/hpc/mtr_scrap/users/peterr/devel/u
cx-cuda-static/build/lib/:/hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/ /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/bindings/go/.libs/tmp/goperftest -m=cuda


[1643915511.186673] [vulcan02:15421:0]           stats.c:861  UCX  TRACE statistics disabled
[1643915511.186698] [vulcan02:15421:0]        memtrack.c:409  UCX  TRACE memtrack disabled
[1643915511.186716] [vulcan02:15421:0]           debug.c:1211 UCX  DEBUG using signal stack 0x7f19a963a000 size 141824
[1643915511.187337] [vulcan02:15421:0]            init.c:116  UCX  DEBUG /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/libucs.so.0 loaded at 0x7f19a8ca9000
[1643915511.187364] [vulcan02:15421:0]            init.c:117  UCX  DEBUG cmd line: /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/bindings/go/.libs/tmp/goperftest -m=cuda
[1643915511.187377] [vulcan02:15421:0]          module.c:69   UCX  DEBUG ucs library path: /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/libucs.so.0
[1643915511.187385] [vulcan02:15421:0]          module.c:273  UCX  DEBUG loading modules for ucs
[1643915511.190368] [vulcan02:15421:0]     ucp_context.c:1776 UCX  INFO  UCP version is 1.13 (release 0)
[1643915511.191009] [vulcan02:15421:0]            time.c:22   UCX  DEBUG measured arch clock speed: 2200000000.00 Hz
[1643915511.191044] [vulcan02:15421:0]     ucp_context.c:1564 UCX  DEBUG estimated number of endpoints is 1
[1643915511.191048] [vulcan02:15421:0]     ucp_context.c:1571 UCX  DEBUG estimated number of endpoints per node is 1
[1643915511.191055] [vulcan02:15421:0]     ucp_context.c:1578 UCX  DEBUG estimated bcopy bandwidth is 6081740800.000000
[1643915511.191065] [vulcan02:15421:0]     ucp_context.c:1644 UCX  DEBUG allocation method[0] is md 'sysv'
[1643915511.191069] [vulcan02:15421:0]     ucp_context.c:1644 UCX  DEBUG allocation method[1] is md 'posix'
[1643915511.191075] [vulcan02:15421:0]     ucp_context.c:1656 UCX  DEBUG allocation method[2] is 'huge'
[1643915511.191078] [vulcan02:15421:0]     ucp_context.c:1656 UCX  DEBUG allocation method[3] is 'thp'
[1643915511.191081] [vulcan02:15421:0]     ucp_context.c:1644 UCX  DEBUG allocation method[4] is md '*'
[1643915511.191085] [vulcan02:15421:0]     ucp_context.c:1656 UCX  DEBUG allocation method[5] is 'mmap'
[1643915511.191088] [vulcan02:15421:0]     ucp_context.c:1656 UCX  DEBUG allocation method[6] is 'heap'
[1643915511.191106] [vulcan02:15421:0]          module.c:273  UCX  DEBUG loading modules for uct
[1643915511.191110] [vulcan02:15421:0]          module.c:239  UCX  TRACE loading module 'cuda' with mode 0x1
[1643915511.193231] [vulcan02:15421:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda.so.0.0.0 [0x2698720]
[1643915511.193247] [vulcan02:15421:0]          module.c:189  UCX  TRACE calling 'ucs_module_global_init' in '/hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda.so.0
.0.0': [0x7f197df916f1]
[1643915511.193253] [vulcan02:15421:0]          module.c:273  UCX  DEBUG loading modules for uct_cuda
[1643915511.193257] [vulcan02:15421:0]          module.c:239  UCX  TRACE loading module 'gdrcopy' with mode 0x1
[1643915511.194687] [vulcan02:15421:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda_gdrcopy.so.0.0.0 [0x2699e50]
[1643915511.194700] [vulcan02:15421:0]          module.c:162  UCX  DEBUG ignoring 'ucs_module_global_init' (0x7f197df916f1) from libuct_cuda.so.0 (0x7f197df8b000), expected in libuct_cuda_gd
rcopy.so.0 (7f197c540000)
[1643915511.194705] [vulcan02:15421:0]          module.c:239  UCX  TRACE loading module 'ib' with mode 0x1
[1643915511.196042] [vulcan02:15421:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_ib.so.0.0.0 [0x269ac00]
[1643915511.196058] [vulcan02:15421:0]          module.c:239  UCX  TRACE loading module 'rdmacm' with mode 0x1
[1643915511.196890] [vulcan02:15421:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_rdmacm.so.0.0.0 [0x269d9a0]
[1643915511.196904] [vulcan02:15421:0]          module.c:239  UCX  TRACE loading module 'cma' with mode 0x1
[1643915511.197524] [vulcan02:15421:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cma.so.0.0.0 [0x269e7d0]
[1643915511.197537] [vulcan02:15421:0]          module.c:239  UCX  TRACE loading module 'knem' with mode 0x1
[1643915511.198332] [vulcan02:15421:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_knem.so.0.0.0 [0x269ee70]
[1643915511.198353] [vulcan02:15421:0]          module.c:239  UCX  TRACE loading module 'xpmem' with mode 0x1
[1643915511.199171] [vulcan02:15421:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_xpmem.so.0.0.0 [0x269f570]
[1643915511.199268] [vulcan02:15421:0]          module.c:273  UCX  DEBUG loading modules for uct_ib
[1643915511.199577] [vulcan02:15421:0]          cma_md.c:115  UCX  TRACE ptrace_scope is 0, CMA is supported
[1643915511.199690] [vulcan02:15421:0]        mm_xpmem.c:116  UCX  DEBUG xpmem version: 155653
[1643915511.199822] [vulcan02:15421:0]     ucp_context.c:908  UCX  TRACE allowed transport 0 : 'cu
[1643915511.213509] [vulcan02:15421:0]     ucp_context.c:683  UCX  TRACE enabling tl 'ud_verbs' for alias 'ud_v'                                                                   [53/491560]
[1643915511.213519] [vulcan02:15421:0]     ucp_context.c:683  UCX  TRACE enabling tl 'ud_verbs' for alias 'ud'
[1643915511.213540] [vulcan02:15421:0]     ucp_context.c:690  UCX  TRACE enabling auxiliary tl 'ud_verbs' for alias 'rc_v'
[1643915511.213544] [vulcan02:15421:0]     ucp_context.c:690  UCX  TRACE enabling auxiliary tl 'ud_verbs' for alias 'rc'
[1643915511.213567] [vulcan02:15421:0]     ucp_context.c:820  UCX  TRACE ud_verbs/mlx5_0:1 is disabled
[1643915511.213573] [vulcan02:15421:0]     ucp_context.c:683  UCX  TRACE enabling tl 'ud_mlx5' for alias 'ib'
[1643915511.213578] [vulcan02:15421:0]     ucp_context.c:683  UCX  TRACE enabling tl 'ud_mlx5' for alias 'ud_x'
[1643915511.213582] [vulcan02:15421:0]     ucp_context.c:683  UCX  TRACE enabling tl 'ud_mlx5' for alias 'ud'
[1643915511.213586] [vulcan02:15421:0]     ucp_context.c:690  UCX  TRACE enabling auxiliary tl 'ud_mlx5' for alias 'rc_x'
[1643915511.213591] [vulcan02:15421:0]     ucp_context.c:690  UCX  TRACE enabling auxiliary tl 'ud_mlx5' for alias 'rc'
[1643915511.213596] [vulcan02:15421:0]     ucp_context.c:820  UCX  TRACE ud_mlx5/mlx5_0:1 is disabled
[1643915511.213600] [vulcan02:15421:0]     ucp_context.c:1306 UCX  DEBUG closing md mlx5_0 because it has no selected transport resources
[1643915511.213714] [vulcan02:15421:0]           mpool.c:154  UCX  DEBUG mpool devx dbrec destroyed
[1643915511.213736] [vulcan02:15421:0]           async.c:156  UCX  DEBUG removed async handler 0x2697a50 [id=9 ref 1] ucs_rcache_invalidate_handler() from hash
[1643915511.213741] [vulcan02:15421:0]           async.c:562  UCX  DEBUG removing async handler 0x2697a50 [id=9 ref 1] ucs_rcache_invalidate_handler()
[1643915511.213751] [vulcan02:15421:0]           async.c:582  UCX  TRACE waiting for 0x2697a50 [id=9 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643915511.213756] [vulcan02:15421:0]           async.c:171  UCX  DEBUG release async handler 0x2697a50 [id=9 ref 0] ucs_rcache_invalidate_handler()
[1643915511.213773] [vulcan02:15421:0]           mpool.c:154  UCX  DEBUG mpool rcache_mp destroyed
[1643915511.213848] [vulcan02:15421:0]       ib_device.c:686  UCX  DEBUG destroying ib device mlx5_0
[1643915511.213868] [vulcan02:15421:0]           async.c:156  UCX  DEBUG removed async handler 0x26a2ec0 [id=5 ref 1] uct_ib_async_event_handler() from hash
[1643915511.213872] [vulcan02:15421:0]           async.c:562  UCX  DEBUG removing async handler 0x26a2ec0 [id=5 ref 1] uct_ib_async_event_handler()
[1643915511.214001] [vulcan02:15421:0]           async.c:582  UCX  TRACE waiting for 0x26a2ec0 [id=5 ref 1] uct_ib_async_event_handler() completion (called=0)
[1643915511.214008] [vulcan02:15421:0]           async.c:171  UCX  DEBUG release async handler 0x26a2ec0 [id=5 ref 0] uct_ib_async_event_handler()
[1643915511.214439] [vulcan02:15421:0]           ib_md.c:1570 UCX  TRACE opening IB device mlx5_1
[1643915511.217886] [vulcan02:15421:0]       ib_device.c:554  UCX  DEBUG PF: mlx5_1 vendor_id: 0x15b3 device_id: 4123
[1643915511.218102] [vulcan02:15421:0]    ib_mlx5dv_md.c:491  UCX  DEBUG mlx5_1: disable ODP because it's not supported for DevX QP
[1643915511.218268] [vulcan02:15421:0]           async.c:231  UCX  DEBUG added async handler 0x2697a00 [id=5 ref 1] uct_ib_async_event_handler() to hash
[1643915511.218330] [vulcan02:15421:0]           async.c:509  UCX  DEBUG listening to async event fd 5 events 0x1 mode thread_spinlock
[1643915511.218338] [vulcan02:15421:0]       ib_device.c:668  UCX  DEBUG initialized device 'mlx5_1' (InfiniBand channel adapter) with 1 ports
[1643915511.218431] [vulcan02:15421:0]           ib_md.c:1675 UCX  DEBUG mlx5_1: cuda GPUDirect RDMA is enabled
[1643915511.218442] [vulcan02:15421:0]           ib_md.c:1675 UCX  DEBUG mlx5_1: rocm GPUDirect RDMA is disabled
[1643915511.218450] [vulcan02:15421:0]           mpool.c:100  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1643915511.218462] [vulcan02:15421:0]           async.c:231  UCX  DEBUG added async handler 0x26a0a70 [id=9 ref 1] ucs_rcache_invalidate_handler() to hash
[1643915511.218473] [vulcan02:15421:0]           async.c:509  UCX  DEBUG listening to async event fd 9 events 0x1 mode thread_spinlock
[1643915511.218564] [vulcan02:15421:0]           ib_md.c:1332 UCX  DEBUG mlx5_1: using registration cache
[1643915511.218586] [vulcan02:15421:0]           ib_md.c:1494 UCX  DEBUG failed to read file: /sys/class/infiniband/mlx5_1/device/current_link_width
[1643915511.218592] [vulcan02:15421:0]           mpool.c:100  UCX  DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40
[1643915511.218785] [vulcan02:15421:0]           ib_md.c:1623 UCX  DEBUG mlx5_1: md open by 'uct_ib_mlx5_devx_md_ops' is successful
[1643915511.219727] [vulcan02:15421:0]       ib_device.c:768  UCX  TRACE mlx5_1:1 is not active (state: 1)
[1643915511.219739] [vulcan02:15421:0]       ib_device.c:1171 UCX  TRACE mlx5_1:1 does not support flags 0x0: Destination is unreachable
[1643915511.219743] [vulcan02:15421:0]       ib_device.c:1185 UCX  DEBUG no compatible IB ports found for flags 0x0
[1643915511.219749] [vulcan02:15421:0]          uct_md.c:113  UCX  DEBUG failed to query rc_verbs resources: No such device
[1643915511.219753] [vulcan02:15421:0]       ib_device.c:768  UCX  TRACE mlx5_1:1 is not active (state: 1)
[1643915511.219757] [vulcan02:15421:0]       ib_device.c:1171 UCX  TRACE mlx5_1:1 does not support flags 0x4: Destination is unreachable
[1643915511.219761] [vulcan02:15421:0]       ib_device.c:1185 UCX  DEBUG no compatible IB ports found for flags 0x4
[1643915511.219764] [vulcan02:15421:0]          uct_md.c:113  UCX  DEBUG failed to query rc_mlx5 resources: No such device
[1643915511.219768] [vulcan02:15421:0]       ib_device.c:768  UCX  TRACE mlx5_1:1 is not active (state: 1)
[1643915511.219772] [vulcan02:15421:0]       ib_device.c:1171 UCX  TRACE mlx5_1:1 does not support flags 0xc4: Destination is unreachable
[1643915511.219776] [vulcan02:15421:0]       ib_device.c:1185 UCX  DEBUG no compatible IB ports found for flags 0xc4
[1643915511.219779] [vulcan02:15421:0]          uct_md.c:113  UCX  DEBUG failed to query dc_mlx5 resources: No such device
[1643915511.219783] [vulcan02:15421:0]       ib_device.c:768  UCX  TRACE mlx5_1:1 is not active (state: 1)
[1643915511.219787] [vulcan02:15421:0]       ib_device.c:1171 UCX  TRACE mlx5_1:1 does not support flags 0x0: Destination is unreachable

[1643915436.022653] [vulcan02:15318:0]       ib_device.c:1185 UCX  DEBUG no compatible IB ports found for flags 0x4
[1643915436.022657] [vulcan02:15318:0]          uct_md.c:113  UCX  DEBUG failed to query ud_mlx5 resources: No such device
[1643915436.022661] [vulcan02:15318:0]     ucp_context.c:892  UCX  DEBUG No tl resources found for md mlx5_1
[1643915436.022664] [vulcan02:15318:0]     ucp_context.c:1306 UCX  DEBUG closing md mlx5_1 because it has no selected transport resources
[1643915436.022759] [vulcan02:15318:0]           mpool.c:154  UCX  DEBUG mpool devx dbrec destroyed
[1643915436.022773] [vulcan02:15318:0]           async.c:156  UCX  DEBUG removed async handler 0x2769550 [id=9 ref 1] ucs_rcache_invalidate_handler() from hash
[1643915436.022778] [vulcan02:15318:0]           async.c:562  UCX  DEBUG removing async handler 0x2769550 [id=9 ref 1] ucs_rcache_invalidate_handler()
[1643915436.022784] [vulcan02:15318:0]           async.c:582  UCX  TRACE waiting for 0x2769550 [id=9 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643915436.022789] [vulcan02:15318:0]           async.c:171  UCX  DEBUG release async handler 0x2769550 [id=9 ref 0] ucs_rcache_invalidate_handler()
[1643915436.022800] [vulcan02:15318:0]           mpool.c:154  UCX  DEBUG mpool rcache_mp destroyed
[1643915436.022866] [vulcan02:15318:0]       ib_device.c:686  UCX  DEBUG destroying ib device mlx5_1
[1643915436.022874] [vulcan02:15318:0]           async.c:156  UCX  DEBUG removed async handler 0x276c4b0 [id=5 ref 1] uct_ib_async_event_handler() from hash
[1643915436.022878] [vulcan02:15318:0]           async.c:562  UCX  DEBUG removing async handler 0x276c4b0 [id=5 ref 1] uct_ib_async_event_handler()
[1643915436.022965] [vulcan02:15318:0]           async.c:582  UCX  TRACE waiting for 0x276c4b0 [id=5 ref 1] uct_ib_async_event_handler() completion (called=0)
[1643915436.022972] [vulcan02:15318:0]           async.c:171  UCX  DEBUG release async handler 0x276c4b0 [id=5 ref 0] uct_ib_async_event_handler()
[1643915436.023331] [vulcan02:15318:0]          cma_md.c:115  UCX  TRACE ptrace_scope is 0, CMA is supported
[1643915436.023374] [vulcan02:15318:0]     ucp_context.c:908  UCX  TRACE allowed transport 0 : 'cuda'
[1643915436.023382] [vulcan02:15318:0]     ucp_context.c:683  UCX  TRACE enabling tl 'cma' for alias 'sm'
[1643915436.023386] [vulcan02:15318:0]     ucp_context.c:683  UCX  TRACE enabling tl 'cma' for alias 'shm'
[1643915436.023393] [vulcan02:15318:0]     ucp_context.c:820  UCX  TRACE cma/memory is disabled
[1643915436.023397] [vulcan02:15318:0]     ucp_context.c:1306 UCX  DEBUG closing md cma because it has no selected transport resources
[1643915436.023474] [vulcan02:15318:0]           mpool.c:100  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1643915436.023485] [vulcan02:15318:0]           async.c:231  UCX  DEBUG added async handler 0x2769f30 [id=5 ref 1] ucs_rcache_invalidate_handler() to hash
[1643915436.023562] [vulcan02:15318:0]           async.c:509  UCX  DEBUG listening to async event fd 5 events 0x1 mode thread_spinlock
[1643915436.023677] [vulcan02:15318:0]     ucp_context.c:908  UCX  TRACE allowed transport 0 : 'cuda'
[1643915436.023689] [vulcan02:15318:0]     ucp_context.c:683  UCX  TRACE enabling tl 'knem' for alias 'sm'
[1643915436.023693] [vulcan02:15318:0]     ucp_context.c:683  UCX  TRACE enabling tl 'knem' for alias 'shm'
[1643915436.023700] [vulcan02:15318:0]     ucp_context.c:820  UCX  TRACE knem/memory is disabled
[1643915436.023704] [vulcan02:15318:0]     ucp_context.c:1306 UCX  DEBUG closing md knem because it has no selected transport resources
[1643915436.023719] [vulcan02:15318:0]           async.c:156  UCX  DEBUG removed async handler 0x2769f30 [id=5 ref 1] ucs_rcache_invalidate_handler() from hash
[1643915436.023724] [vulcan02:15318:0]           async.c:562  UCX  DEBUG removing async handler 0x2769f30 [id=5 ref 1] ucs_rcache_invalidate_handler()
[1643915436.023811] [vulcan02:15318:0]           async.c:582  UCX  TRACE waiting for 0x2769f30 [id=5 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643915436.023819] [vulcan02:15318:0]           async.c:171  UCX  DEBUG release async handler 0x2769f30 [id=5 ref 0] ucs_rcache_invalidate_handler()
[1643915436.023827] [vulcan02:15318:0]           mpool.c:154  UCX  DEBUG mpool rcache_mp destroyed
[1643915436.023853] [vulcan02:15318:0]        mm_xpmem.c:116  UCX  DEBUG xpmem version: 155653
[1643915436.023893] [vulcan02:15318:0]     ucp_context.c:908  UCX  TRACE allowed transport 0 : 'cuda'
[1643915436.023900] [vulcan02:15318:0]     ucp_context.c:683  UCX  TRACE enabling tl 'xpmem' for alias 'mm'
[1643915436.023903] [vulcan02:15318:0]     ucp_context.c:683  UCX  TRACE enabling tl 'xpmem' for alias 'sm'
[1643915436.023907] [vulcan02:15318:0]     ucp_context.c:683  UCX  TRACE enabling tl 'xpmem' for alias 'shm'
[1643915436.023914] [vulcan02:15318:0]     ucp_context.c:820  UCX  TRACE xpmem/memory is disabled
[1643915436.023917] [vulcan02:15318:0]     ucp_context.c:1306 UCX  DEBUG closing md xpmem because it has no selected transport resources
[1643915436.023946] [vulcan02:15318:0]     ucp_context.c:975  UCX  WARN  transport 'cuda' is not available, please use one or more of: cma, dc, dc_mlx5, dc_x, ib, knem, mm, posix, rc, rc_mlx5, rc_v, rc_verbs, rc_x, self, shm, sm, sysv, tcp, ud, ud_mlx5, ud_v, ud_verbs, ud_x, xpmem
[1643915436.023956] [vulcan02:15318:0]     ucp_context.c:1230 UCX  ERROR no usable transports/devices (asked cuda on all devices)

petro-rudenko avatar Feb 03 '22 19:02 petro-rudenko

Strange: ucx_info -d | grep cuda also empty. https://github.com/openucx/ucx/blob/master/buildlib/pr/go/go-test.yml#L35-L45 - build like this.

$module show dev/cuda11.4
-------------------------------------------------------------------
/hpc/local/etc/modulefiles/dev/cuda11.4:

module-whatis    add CUDA to your environment
setenv           CUDA_HOME /hpc/local/oss/cuda11.4
prepend-path     PATH /hpc/local/oss/cuda11.4/bin
prepend-path     CPATH /hpc/local/oss/cuda11.4/include
prepend-path     FPATH /hpc/local/oss/cuda11.4/include
prepend-path     INCLUDE /hpc/local/oss/cuda11.4/include
prepend-path     LIBRARY_PATH /hpc/local/oss/cuda11.4/lib64:/hpc/local/oss/cuda11.4/lib64/stubs
prepend-path     LD_LIBRARY_PATH /hpc/local/oss/cuda11.4/lib64:/hpc/local/oss/cuda11.4/lib64/stubs
-------------------------------------------------------------------

petro-rudenko avatar Feb 03 '22 19:02 petro-rudenko

@Akshay-Venkatesh may be the issue with out of source build. Try

mkdir build
cd build
 ../contrib/configure-devel --enable-debug --enable-debug-data --with-java=no  --with-go --prefix=$PWD --with-cuda
make install
bin/ucx_info -d | grep cuda

petro-rudenko avatar Feb 03 '22 19:02 petro-rudenko

seems cuda module is loaded correctly, maybe cuDeviceGetCount() returns 0?

yosefe avatar Feb 03 '22 19:02 yosefe

UCX_LOG_LEVEL=trace LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib:/hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx bin/ucx_info -d:

[1643916642.815533] [vulcan02:17018:0]           stats.c:861  UCX  TRACE statistics disabled
[1643916642.815554] [vulcan02:17018:0]        memtrack.c:409  UCX  TRACE memtrack disabled
[1643916642.815570] [vulcan02:17018:0]           debug.c:1211 UCX  DEBUG using signal stack 0x7fa189830000 size 141824
[1643916642.816208] [vulcan02:17018:0]            init.c:116  UCX  DEBUG /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/libucs.so.0 loaded at 0x7fa188c43000
[1643916642.816229] [vulcan02:17018:0]            init.c:117  UCX  DEBUG cmd line: bin/ucx_info -d 
[1643916642.816239] [vulcan02:17018:0]          module.c:69   UCX  DEBUG ucs library path: /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/libucs.so.0
[1643916642.816246] [vulcan02:17018:0]          module.c:273  UCX  DEBUG loading modules for ucs
[1643916642.816353] [vulcan02:17018:0]          module.c:273  UCX  DEBUG loading modules for uct
[1643916642.816356] [vulcan02:17018:0]          module.c:239  UCX  TRACE loading module 'cuda' with mode 0x1
[1643916642.818109] [vulcan02:17018:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda.so.0.0.0 [0x2076be0]
[1643916642.818116] [vulcan02:17018:0]          module.c:189  UCX  TRACE calling 'ucs_module_global_init' in '/hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda.so.0.0.0': [0x7fa1874db6f1]
[1643916642.818119] [vulcan02:17018:0]          module.c:273  UCX  DEBUG loading modules for uct_cuda
[1643916642.818121] [vulcan02:17018:0]          module.c:239  UCX  TRACE loading module 'gdrcopy' with mode 0x1
[1643916642.819847] [vulcan02:17018:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda_gdrcopy.so.0.0.0 [0x2078280]
[1643916642.819853] [vulcan02:17018:0]          module.c:162  UCX  DEBUG ignoring 'ucs_module_global_init' (0x7fa1874db6f1) from libuct_cuda.so.0 (0x7fa1874d5000), expected in libuct_cuda_gdrcopy.so.0 (7fa184aec000)
[1643916642.819856] [vulcan02:17018:0]          module.c:239  UCX  TRACE loading module 'ib' with mode 0x1
[1643916642.821606] [vulcan02:17018:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_ib.so.0.0.0 [0x2079030]
[1643916642.821613] [vulcan02:17018:0]          module.c:239  UCX  TRACE loading module 'rdmacm' with mode 0x1
[1643916642.822534] [vulcan02:17018:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_rdmacm.so.0.0.0 [0x207bdd0]
[1643916642.822541] [vulcan02:17018:0]          module.c:239  UCX  TRACE loading module 'cma' with mode 0x1
[1643916642.823883] [vulcan02:17018:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cma.so.0.0.0 [0x207cc00]
[1643916642.823893] [vulcan02:17018:0]          module.c:239  UCX  TRACE loading module 'knem' with mode 0x1
[1643916642.824701] [vulcan02:17018:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_knem.so.0.0.0 [0x207d2a0]
[1643916642.824706] [vulcan02:17018:0]          module.c:239  UCX  TRACE loading module 'xpmem' with mode 0x1
[1643916642.825667] [vulcan02:17018:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_xpmem.so.0.0.0 [0x207d9a0]
#
# Memory domain: posix
#     Component: posix
#             allocate: <= 132039668K
#           remote key: 24 bytes
#           rkey_ptr is supported
#
#      Transport: posix
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
[1643916642.825882] [vulcan02:17018:0]         uct_mem.c:106  UCX  TRACE allocating mm_recv_fifo: host memory length 8447 flags 0x3e0
[1643916642.825885] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method md
[1643916642.826020] [vulcan02:17018:0]             sys.c:653  UCX  TRACE   detected huge page size: 2097152
[1643916642.826028] [vulcan02:17018:0]        mm_posix.c:531  UCX  DEBUG   allocated posix shared memory at 0x7fa189868000 length 12288
[1643916642.826032] [vulcan02:17018:0]         uct_mem.c:304  UCX  TRACE   allocated 12288 bytes at 0x7fa189868000 using posix
[1643916642.826056] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8288
[1643916642.826063] [vulcan02:17018:0]         uct_mem.c:106  UCX  TRACE allocating mm_recv_desc: host memory length 4259952 flags 0x3e0
[1643916642.826065] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method md
[1643916642.827961] [vulcan02:17018:0]        mm_posix.c:326  UCX  DEBUG   shared memory mmap(addr=(nil), length=6291456, flags= HUGETLB, fd=5) failed: Invalid argument
[1643916642.827968] [vulcan02:17018:0]        mm_posix.c:531  UCX  DEBUG   allocated posix shared memory at 0x7fa182bb8000 length 4263936
[1643916642.827970] [vulcan02:17018:0]         uct_mem.c:304  UCX  TRACE   allocated 4263936 bytes at 0x7fa182bb8000 using posix
[1643916642.827978] [vulcan02:17018:0]           mpool.c:237  UCX  DEBUG mpool mm_recv_desc: allocated chunk 0x7fa182bb8018 of 4263912 bytes with 512 elements
[1643916642.828441] [vulcan02:17018:0]        mm_iface.c:674  UCX  DEBUG created mm iface 0x2082a00 FIFO id 0xc0000000c000427a va 0x7fa189868000 size 12288 (128 x 64 elems)
#
#      capabilities:
#            bandwidth: 0.00/ppn + 12179.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 8 bytes
#       error handling: ep_check
[1643916642.829082] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool mm_recv_desc destroyed
#
#
# Memory domain: sysv
#     Component: sysv
#             allocate: unlimited
#           remote key: 12 bytes
#           rkey_ptr is supported
#
#      Transport: sysv
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
[1643916642.829191] [vulcan02:17018:0]         uct_mem.c:106  UCX  TRACE allocating mm_recv_fifo: host memory length 8447 flags 0x3e0
[1643916642.829194] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method md
[1643916642.829199] [vulcan02:17018:0]         mm_sysv.c:94   UCX  DEBUG   mm failed to allocate 8447 bytes with hugetlb
[1643916642.829218] [vulcan02:17018:0]         uct_mem.c:304  UCX  TRACE   allocated 12288 bytes at 0x7fa189868000 using sysv
[1643916642.829234] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8288
[1643916642.829236] [vulcan02:17018:0]         uct_mem.c:106  UCX  TRACE allocating mm_recv_desc: host memory length 4259952 flags 0x3e0
[1643916642.829238] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method md
[1643916642.829255] [vulcan02:17018:0]         mm_sysv.c:94   UCX  DEBUG   mm failed to allocate 4259952 bytes with hugetlb
[1643916642.829264] [vulcan02:17018:0]         uct_mem.c:304  UCX  TRACE   allocated 4263936 bytes at 0x7fa182bb8000 using sysv
[1643916642.829274] [vulcan02:17018:0]           mpool.c:237  UCX  DEBUG mpool mm_recv_desc: allocated chunk 0x7fa182bb8018 of 4263912 bytes with 512 elements
[1643916642.830101] [vulcan02:17018:0]        mm_iface.c:674  UCX  DEBUG created mm iface 0x2083060 FIFO id 0x3c768000 va 0x7fa189868000 size 12288 (128 x 64 elems)
#
#      capabilities:
#            bandwidth: 0.00/ppn + 12179.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 8 bytes
#       error handling: ep_check
[1643916642.830434] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool mm_recv_desc destroyed
#
#
# Memory domain: self
#     Component: self
#             register: unlimited, cost: 0 nsec
#           remote key: 0 bytes
#
#      Transport: self
#         Device: memory0
#           Type: loopback
#  System device: <unknown>
[1643916642.830525] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool self_msg_desc: align 64, maxelems 4294967295, elemsize 8200
[1643916642.830529] [vulcan02:17018:0]            self.c:222  UCX  DEBUG created self iface id 0xb191fcf4ddda9707 send_size 8192
#
#      capabilities:
#            bandwidth: 0.00/ppn + 6911.00 MB/sec
#              latency: 0 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 8K
#             am_bcopy: <= 8K
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 0 bytes
#        iface address: 8 bytes
#       error handling: ep_check
[1643916642.830548] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool self_msg_desc destroyed
#
#
# Memory domain: tcp
#     Component: tcp
#             register: unlimited, cost: 0 nsec
#           remote key: 0 bytes
#
[1643916642.832428] [vulcan02:17018:0]            time.c:22   UCX  DEBUG measured arch clock speed: 2200000000.00 Hz
#      Transport: tcp
#         Device: enp4s0f0
#           Type: network
#  System device: <unknown>
[1643916642.832446] [vulcan02:17018:0]       tcp_iface.c:587  UCX  DEBUG using TCP port range: 0-0
[1643916642.832450] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool uct_tcp_iface_tx_buf_mp: align 64, maxelems 4294967295, elemsize 8205
[1643916642.832452] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool uct_tcp_iface_rx_buf_mp: align 64, maxelems 4294967295, elemsize 131090
[1643916642.834165] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() to hash
[1643916642.834245] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 4 events 0x5 mode thread_spinlock
[1643916642.834256] [vulcan02:17018:0]       tcp_iface.c:537  UCX  DEBUG tcp_iface 0x20830e0: listening for connections (fd=4) on 10.210.0.167:33267
#
#      capabilities:
#            bandwidth: 113.16/ppn + 0.00 MB/sec
#              latency: 5776 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 0
#     device num paths: 1
#              max eps: 256
#       device address: 6 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
[1643916642.834381] [vulcan02:17018:0]       tcp_iface.c:823  UCX  DEBUG tcp_iface 0x20830e0: destroying
[1643916642.834390] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() from hash
[1643916642.834392] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler()
[1643916642.834475] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() completion (called=0)
[1643916642.834478] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x207ef00 [id=4 ref 0] uct_tcp_iface_connect_handler()
[1643916642.834481] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool uct_tcp_iface_rx_buf_mp destroyed
[1643916642.834483] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool uct_tcp_iface_tx_buf_mp destroyed
#
#      Transport: tcp
#         Device: lo
#           Type: network
#  System device: <unknown>
[1643916642.834532] [vulcan02:17018:0]       tcp_iface.c:587  UCX  DEBUG using TCP port range: 0-0
[1643916642.834535] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool uct_tcp_iface_tx_buf_mp: align 64, maxelems 4294967295, elemsize 8205
[1643916642.834537] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool uct_tcp_iface_rx_buf_mp: align 64, maxelems 4294967295, elemsize 131090
[1643916642.834747] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() to hash
[1643916642.834798] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 4 events 0x5 mode thread_spinlock
[1643916642.834803] [vulcan02:17018:0]       tcp_iface.c:537  UCX  DEBUG tcp_iface 0x20830e0: listening for connections (fd=4) on 127.0.0.1:60704
#
#      capabilities:
[1643916642.834822] [vulcan02:17018:0]            sock.c:90   UCX  DEBUG ioctl(req=35142, ifr_name=lo) failed: Operation not supported
[1643916642.834829] [vulcan02:17018:0]         tcp_net.c:61   UCX  DEBUG speed of lo is UNKNOWN, assuming 100 Mbps
#            bandwidth: 11.91/ppn + 0.00 MB/sec
#              latency: 10960 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 1
#     device num paths: 1
#              max eps: 256
#       device address: 18 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
[1643916642.834903] [vulcan02:17018:0]       tcp_iface.c:823  UCX  DEBUG tcp_iface 0x20830e0: destroying
[1643916642.834906] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() from hash
[1643916642.834909] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler()
[1643916642.834960] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() completion (called=0)
[1643916642.834963] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x207ef00 [id=4 ref 0] uct_tcp_iface_connect_handler()
[1643916642.834966] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool uct_tcp_iface_rx_buf_mp destroyed
[1643916642.834967] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool uct_tcp_iface_tx_buf_mp destroyed
#
#      Transport: tcp
#         Device: ib0
#           Type: network
#  System device: <unknown>
[1643916642.835011] [vulcan02:17018:0]       tcp_iface.c:587  UCX  DEBUG using TCP port range: 0-0
[1643916642.835014] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool uct_tcp_iface_tx_buf_mp: align 64, maxelems 4294967295, elemsize 8205
[1643916642.835016] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool uct_tcp_iface_rx_buf_mp: align 64, maxelems 4294967295, elemsize 131090
[1643916642.835223] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() to hash
[1643916642.835270] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 4 events 0x5 mode thread_spinlock
[1643916642.835274] [vulcan02:17018:0]       tcp_iface.c:537  UCX  DEBUG tcp_iface 0x20830e0: listening for connections (fd=4) on 1.1.10.2:56988
#
#      capabilities:
#            bandwidth: 11142.51/ppn + 0.00 MB/sec
#              latency: 5206 nsec
#             overhead: 50000 nsec
#            put_zcopy: <= 18446744073709551590, up to 6 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 0
#             am_short: <= 8K
#             am_bcopy: <= 8K
#             am_zcopy: <= 64K, up to 6 iov
#   am_opt_zcopy_align: <= 1
#         am_align_mtu: <= 0
#            am header: <= 8037
#           connection: to ep, to iface
#      device priority: 1
#     device num paths: 1
#              max eps: 256
#       device address: 6 bytes
#        iface address: 2 bytes
#           ep address: 10 bytes
#       error handling: peer failure, ep_check, keepalive
[1643916642.835718] [vulcan02:17018:0]       tcp_iface.c:823  UCX  DEBUG tcp_iface 0x20830e0: destroying
[1643916642.835734] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() from hash
[1643916642.835741] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler()
[1643916642.835806] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() completion (called=0)
[1643916642.835809] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x207ef00 [id=4 ref 0] uct_tcp_iface_connect_handler()
[1643916642.835814] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool uct_tcp_iface_rx_buf_mp destroyed
[1643916642.835816] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool uct_tcp_iface_tx_buf_mp destroyed
#
[1643916642.835918] [vulcan02:17018:0]      tcp_sockcm.c:221  UCX  DEBUG created tcp_sockcm 0x2082160
#
# Connection manager: tcp
#      max_conn_priv: 2064 bytes
[1643916642.835999] [vulcan02:17018:0]          module.c:273  UCX  DEBUG loading modules for uct_ib
[1643916642.836370] [vulcan02:17018:0]           ib_md.c:1570 UCX  TRACE opening IB device mlx5_0
[1643916642.839854] [vulcan02:17018:0]       ib_device.c:554  UCX  DEBUG PF: mlx5_0 vendor_id: 0x15b3 device_id: 4123
[1643916642.840074] [vulcan02:17018:0]    ib_mlx5dv_md.c:491  UCX  DEBUG mlx5_0: disable ODP because it's not supported for DevX QP
[1643916642.842874] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x2075b10 [id=4 ref 1] uct_ib_async_event_handler() to hash
[1643916642.842935] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 4 events 0x1 mode thread_spinlock
[1643916642.842949] [vulcan02:17018:0]       ib_device.c:668  UCX  DEBUG initialized device 'mlx5_0' (InfiniBand channel adapter) with 1 ports
[1643916642.843128] [vulcan02:17018:0]           ib_md.c:1675 UCX  DEBUG mlx5_0: cuda GPUDirect RDMA is enabled
[1643916642.843136] [vulcan02:17018:0]           ib_md.c:1675 UCX  DEBUG mlx5_0: rocm GPUDirect RDMA is disabled
[1643916642.843162] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1643916642.844683] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x2082090 [id=8 ref 1] ucs_rcache_invalidate_handler() to hash
[1643916642.844697] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 8 events 0x1 mode thread_spinlock
[1643916642.844811] [vulcan02:17018:0]          module.c:273  UCX  DEBUG loading modules for ucm
[1643916642.844831] [vulcan02:17018:0]          module.c:239  UCX  TRACE loading module 'cuda' with mode 0x1001
[1643916642.845699] [vulcan02:17018:0]          module.c:180  UCX  TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libucm_cuda.so.0.0.0 [0x20c46f0]
[1643916642.845855] [vulcan02:17018:0]           ib_md.c:1332 UCX  DEBUG mlx5_0: using registration cache
[1643916642.845891] [vulcan02:17018:0]           ib_md.c:1494 UCX  DEBUG failed to read file: /sys/class/infiniband/mlx5_0/device/current_link_width
[1643916642.845900] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40
[1643916642.846149] [vulcan02:17018:0]           ib_md.c:1623 UCX  DEBUG mlx5_0: md open by 'uct_ib_mlx5_devx_md_ops' is successful
[1643916642.847289] [vulcan02:17018:0]            topo.c:141  UCX  DEBUG added sys_dev 0 for bus id 02:00.0
[1643916642.847296] [vulcan02:17018:0]       ib_device.c:1140 UCX  DEBUG mlx5_0 bus id 0:2:0.0 sys_dev 0
[1643916642.847335] [vulcan02:17018:0]       ib_device.c:1140 UCX  DEBUG mlx5_0 bus id 0:2:0.0 sys_dev 0
[1643916642.847366] [vulcan02:17018:0]       ib_device.c:1140 UCX  DEBUG mlx5_0 bus id 0:2:0.0 sys_dev 0
[1643916642.847410] [vulcan02:17018:0]       ib_device.c:1140 UCX  DEBUG mlx5_0 bus id 0:2:0.0 sys_dev 0
[1643916642.847437] [vulcan02:17018:0]       ib_device.c:1140 UCX  DEBUG mlx5_0 bus id 0:2:0.0 sys_dev 0
#
# Memory domain: mlx5_0
#     Component: ib
#             register: unlimited, cost: 180 nsec
#           remote key: 8 bytes
#           local memory handle is required for zcopy
#           memory invalidation is supported
#
#      Transport: rc_verbs
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (0)
[1643916642.847684] [vulcan02:17018:0]        ib_iface.c:866  UCX  DEBUG using pkey[0] 0xffff on mlx5_0:1/IB
[1643916642.848388] [vulcan02:17018:0]        ib_iface.c:1473 UCX  DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 12 hdr_ofs 11 data_sz 8256
[1643916642.848422] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8276
[1643916642.848425] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8336
[1643916642.848471] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 64
[1643916642.848848] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64
[1643916642.848854] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 208
[1643916642.849209] [vulcan02:17018:0]        ib_iface.c:1008 UCX  DEBUG iface=0x20c8f10: created RC QP 0x1a611 on mlx5_0:1 TX wr:409 sge:5 inl:124 resp:64 RX wr:0 sge:0 resp:64
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 75 nsec
#            put_short: <= 124
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 5 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 5 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 123
#             am_bcopy: <= 8255
#             am_zcopy: <= 8255, up to 4 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 127
#               domain: device
#           atomic_add: 64 bit
#          atomic_fadd: 64 bit
#         atomic_cswap: 64 bit
#           connection: to ep
#      device priority: 50
#     device num paths: 1
#              max eps: 256
#       device address: 3 bytes
#           ep address: 5 bytes
#       error handling: peer failure, ep_check
[1643916642.849762] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rc_verbs_short_desc destroyed
[1643916642.850174] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool send-ops-mpool destroyed
[1643916642.850177] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rc_send_desc destroyed
[1643916642.850179] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rc_recv_desc destroyed
[1643916642.850180] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool pending-ops destroyed
#
#
#      Transport: rc_mlx5
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (0)
[1643916642.850767] [vulcan02:17018:0]        ib_iface.c:866  UCX  DEBUG using pkey[0] 0xffff on mlx5_0:1/IB
[1643916642.850800] [vulcan02:17018:0]       ib_device.c:1409 UCX  DEBUG max IB CQE size is 128
[1643916642.851939] [vulcan02:17018:0]        ib_iface.c:1473 UCX  DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 12 hdr_ofs 10 data_sz 8256
[1643916642.851948] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8276
[1643916642.851951] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8336
[1643916642.852001] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 64
[1643916642.852363] [vulcan02:17018:0]           mpool.c:237  UCX  DEBUG mpool devx dbrec: allocated chunk 0x21ae010 of 8176 bytes with 127 elements
[1643916642.852570] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64
[1643916642.852698] [vulcan02:17018:0]         ib_mlx5.c:889  UCX  DEBUG SL=0 (AR support - no) was selected on mlx5_0:1, SLs with AR support = { <none> }, SLs without AR support = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }
[1643916642.853296] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool mlx5_dm_desc: align 64, maxelems 1, elemsize 80
[1643916642.853304] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 88
[1643916642.856148] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x2081c00 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash
[1643916642.856162] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 11 events 0x1 mode thread_spinlock
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 600 + 1.000 * N nsec
#             overhead: 40 nsec
#            put_short: <= 2K
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 14 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 14 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 2046
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 186
#               domain: device
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to ep
#      device priority: 50
#     device num paths: 1
#              max eps: 256
#       device address: 3 bytes
#           ep address: 7 bytes
#       error handling: buffer (zcopy), remote access, peer failure, ep_check
[1643916642.856216] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x2081c00 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() from hash
[1643916642.856219] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x2081c00 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler()
[1643916642.856224] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x2081c00 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() completion (called=0)
[1643916642.856226] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x2081c00 [id=11 ref 0] uct_rc_mlx5_devx_iface_event_handler()
[1643916642.856232] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rc_mlx5_atomic_desc destroyed
[1643916642.856235] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool mlx5_dm_desc destroyed
[1643916642.856885] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool send-ops-mpool destroyed
[1643916642.856890] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rc_send_desc destroyed
[1643916642.856892] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rc_recv_desc destroyed
[1643916642.856894] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool pending-ops destroyed
#
#
#      Transport: dc_mlx5
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (0)
[1643916642.857682] [vulcan02:17018:0]        ib_iface.c:866  UCX  DEBUG using pkey[0] 0xffff on mlx5_0:1/IB
[1643916642.858804] [vulcan02:17018:0]        ib_iface.c:1473 UCX  DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 12 hdr_ofs 10 data_sz 8256
[1643916642.858829] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8276
[1643916642.858832] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8336
[1643916642.858887] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 64
[1643916642.859376] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112
[1643916642.859477] [vulcan02:17018:0]         ib_mlx5.c:889  UCX  DEBUG SL=0 (AR support - no) was selected on mlx5_0:1, SLs with AR support = { <none> }, SLs without AR support = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }
[1643916642.860009] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool mlx5_dm_desc: align 64, maxelems 1, elemsize 80
[1643916642.860016] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 88
[1643916642.860036] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x20d1f10 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash
[1643916642.860054] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 11 events 0x1 mode thread_spinlock
[1643916642.860367] [vulcan02:17018:0]         dc_mlx5.c:836  UCX  DEBUG creating dci pool 0 with 8 QPs
[1643916642.864991] [vulcan02:17018:0]         dc_mlx5.c:1386 UCX  DEBUG dc iface 0x218c640: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x1b74a
[1643916642.865015] [vulcan02:17018:0]         uct_mem.c:106  UCX  TRACE allocating rc_recv_desc: host memory length 37481712 flags 0x3e0
[1643916642.865018] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method huge
[1643916642.865039] [vulcan02:17018:0]         uct_mem.c:283  UCX  TRACE   failed to allocate 37481712 bytes from hugetlb: Out of memory
[1643916642.865041] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method thp
[1643916642.865083] [vulcan02:17018:0]         uct_mem.c:304  UCX  TRACE   allocated 37748736 bytes at 0x7fa180000000 using thp
[1643916642.865108] [vulcan02:17018:0]           mpool.c:237  UCX  DEBUG mpool rcache_mp: allocated chunk 0x7fa18980b008 of 151544 bytes with 1052 elements
[1643916642.877550] [vulcan02:17018:0]           ib_md.c:545  UCX  TRACE ibv_reg_mr(pd=0x20822b0 addr=0x7fa180000000 length=37748736): mr=0x20c9cc0 took 12.351 msec
[1643916642.877571] [vulcan02:17018:0]           ib_md.c:788  UCX  TRACE registered memory 0x7fa180000000..0x7fa182400000 on mlx5_0 lkey 0xf31f4 rkey 0xf31f4 access 0xf flags 0x3e4
[1643916642.877630] [vulcan02:17018:0]          rcache.c:955  UCX  TRACE mlx5_0: created region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] gt rw ref 2 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.877636] [vulcan02:17018:0]           mpool.c:237  UCX  DEBUG mpool rc_recv_desc: allocated chunk 0x7fa180000018 of 37748712 bytes with 4537 elements
[1643916642.878184] [vulcan02:17018:0]         dc_mlx5.c:1402 UCX  DEBUG created dc iface 0x218c640
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 660 nsec
#             overhead: 40 nsec
#            put_short: <= 2K
#            put_bcopy: <= 8256
#            put_zcopy: <= 1G, up to 11 iov
#  put_opt_zcopy_align: <= 512
#        put_align_mtu: <= 4K
#            get_bcopy: <= 8256
#            get_zcopy: 65..1G, up to 11 iov
#  get_opt_zcopy_align: <= 512
#        get_align_mtu: <= 4K
#             am_short: <= 2046
#             am_bcopy: <= 8254
#             am_zcopy: <= 8254, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 138
#               domain: device
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 5 bytes
#       error handling: buffer (zcopy), remote access, peer failure, ep_check
[1643916642.882469] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x20d1f10 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() from hash
[1643916642.882477] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x20d1f10 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler()
[1643916642.882489] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x20d1f10 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() completion (called=0)
[1643916642.882494] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x20d1f10 [id=11 ref 0] uct_rc_mlx5_devx_iface_event_handler()
[1643916642.882509] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rc_mlx5_atomic_desc destroyed
[1643916642.882522] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool mlx5_dm_desc destroyed
[1643916642.883304] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool send-ops-mpool destroyed
[1643916642.883310] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rc_send_desc destroyed
[1643916642.883344] [vulcan02:17018:0]          rcache.c:337  UCX  TRACE mlx5_0: lru add region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] gt rw ref 2 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.883353] [vulcan02:17018:0]          rcache.c:423  UCX  TRACE mlx5_0: put region, flags 0x1 region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] gt rw ref 2 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.883365] [vulcan02:17018:0]          rcache.c:462  UCX  TRACE mlx5_0: invalidate region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] gt rw ref 1 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.883379] [vulcan02:17018:0]          rcache.c:423  UCX  TRACE mlx5_0: put region, flags 0xa region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] g- rw ref 1 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.883386] [vulcan02:17018:0]          rcache.c:436  UCX  TRACE mlx5_0: put on GC list, flags 0xa region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] g- rw ref 0 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.883432] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rc_recv_desc destroyed
[1643916642.883437] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool pending-ops destroyed
#
#
#      Transport: ud_verbs
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (0)
[1643916642.884151] [vulcan02:17018:0]        ib_iface.c:866  UCX  DEBUG using pkey[0] 0xffff on mlx5_0:1/IB
[1643916642.884754] [vulcan02:17018:0]        ib_iface.c:1473 UCX  DEBUG created uct_ib_iface_t headroom_ofs 88 payload_ofs 88 hdr_ofs 40 data_sz 4096
[1643916642.885182] [vulcan02:17018:0]        ib_iface.c:1008 UCX  DEBUG iface=0x2080fc0: created UD QP 0x1b753 on mlx5_0:1 TX wr:341 sge:6 inl:124 resp:0 RX wr:4096 sge:1 resp:0
[1643916642.885564] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4192
[1643916642.885570] [vulcan02:17018:0]         uct_mem.c:106  UCX  TRACE allocating ud_recv_skb: host memory length 540784 flags 0x3e0
[1643916642.885572] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method huge
[1643916642.885576] [vulcan02:17018:0]         uct_mem.c:283  UCX  TRACE   failed to allocate 540784 bytes from hugetlb: User-defined limit was reached
[1643916642.885578] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method thp
[1643916642.885598] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method md
[1643916642.885605] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method mmap
[1643916642.885618] [vulcan02:17018:0]         uct_mem.c:304  UCX  TRACE   allocated 544768 bytes at 0x7fa189786000 using mmap
[1643916642.885627] [vulcan02:17018:0]          rcache.c:379  UCX  TRACE mlx5_0: destroy region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] g- rw ref 0 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.885631] [vulcan02:17018:0]           ib_md.c:558  UCX  TRACE ibv_dereg_mr(mr=0x20c9cc0 addr=0x7fa180000000 length=37748736)
[1643916642.890270] [vulcan02:17018:0]          rcache.c:350  UCX  TRACE mlx5_0: lru remove region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] g- rw ref 0 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.890595] [vulcan02:17018:0]           ib_md.c:545  UCX  TRACE ibv_reg_mr(pd=0x20822b0 addr=0x7fa189786000 length=544768): mr=0x20c9cc0 took 0.312 msec
[1643916642.890598] [vulcan02:17018:0]           ib_md.c:788  UCX  TRACE registered memory 0x7fa189786000..0x7fa18980b000 on mlx5_0 lkey 0x265907 rkey 0x265907 access 0xf flags 0x3e4
[1643916642.890602] [vulcan02:17018:0]          rcache.c:955  UCX  TRACE mlx5_0: created region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.890604] [vulcan02:17018:0]           mpool.c:237  UCX  DEBUG mpool ud_recv_skb: allocated chunk 0x7fa189786018 of 544744 bytes with 128 elements
[1643916642.890613] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168
[1643916642.890656] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80::9803:9b03:67:a59c to hash on device mlx5_0 port 1 index 0)
[1643916642.890677] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 1)
[1643916642.890690] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 2)
[1643916642.890701] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 3)
[1643916642.890713] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 4)
[1643916642.890725] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 5)
[1643916642.890737] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 6)
[1643916642.890749] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 7)
[1643916642.890920] [vulcan02:17018:0]     timer_wheel.c:41   UCX  DEBUG high res timer created log=23 resolution=3813.003636 usec wanted: 2500.000000 usec
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 105 nsec
#             am_short: <= 116
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 5 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 3952
#           connection: to ep, to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure, ep_check
[1643916642.890959] [vulcan02:17018:0]        ud_iface.c:638  UCX  DEBUG iface(0x2080fc0): cep cleanup
[1643916642.890964] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool ud_tx_skb destroyed
[1643916642.890968] [vulcan02:17018:0]          rcache.c:337  UCX  TRACE mlx5_0: lru add region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.890971] [vulcan02:17018:0]          rcache.c:423  UCX  TRACE mlx5_0: put region, flags 0x1 region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.890979] [vulcan02:17018:0]          rcache.c:462  UCX  TRACE mlx5_0: invalidate region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 1 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.890985] [vulcan02:17018:0]          rcache.c:423  UCX  TRACE mlx5_0: put region, flags 0xa region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 1 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.890988] [vulcan02:17018:0]          rcache.c:436  UCX  TRACE mlx5_0: put on GC list, flags 0xa region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.891015] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool ud_recv_skb destroyed
[1643916642.891480] [vulcan02:17018:0]        ud_iface.c:645  UCX  DEBUG iface(0x2080fc0): ptr_array cleanup
#
#
#      Transport: ud_mlx5
#         Device: mlx5_0:1
#           Type: network
#  System device: mlx5_0 (0)
[1643916642.891836] [vulcan02:17018:0]        ib_iface.c:866  UCX  DEBUG using pkey[0] 0xffff on mlx5_0:1/IB
[1643916642.892390] [vulcan02:17018:0]        ib_iface.c:1473 UCX  DEBUG created uct_ib_iface_t headroom_ofs 88 payload_ofs 88 hdr_ofs 40 data_sz 4096
[1643916642.892747] [vulcan02:17018:0]        ib_iface.c:1008 UCX  DEBUG iface=0x2080fc0: created UD QP 0x1b754 on mlx5_0:1 TX wr:341 sge:6 inl:124 resp:0 RX wr:4096 sge:1 resp:0
[1643916642.892758] [vulcan02:17018:0]         ib_mlx5.c:568  UCX  DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post
[1643916642.893100] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4192
[1643916642.893103] [vulcan02:17018:0]         uct_mem.c:106  UCX  TRACE allocating ud_recv_skb: host memory length 540784 flags 0x3e0
[1643916642.893105] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method huge
[1643916642.893108] [vulcan02:17018:0]         uct_mem.c:283  UCX  TRACE   failed to allocate 540784 bytes from hugetlb: User-defined limit was reached
[1643916642.893109] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method thp
[1643916642.893123] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method md
[1643916642.893129] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method mmap
[1643916642.893136] [vulcan02:17018:0]         uct_mem.c:304  UCX  TRACE   allocated 544768 bytes at 0x7fa189786000 using mmap
[1643916642.893144] [vulcan02:17018:0]          rcache.c:379  UCX  TRACE mlx5_0: destroy region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.893146] [vulcan02:17018:0]           ib_md.c:558  UCX  TRACE ibv_dereg_mr(mr=0x20c9cc0 addr=0x7fa189786000 length=544768)
[1643916642.893216] [vulcan02:17018:0]          rcache.c:350  UCX  TRACE mlx5_0: lru remove region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.893372] [vulcan02:17018:0]           ib_md.c:545  UCX  TRACE ibv_reg_mr(pd=0x20822b0 addr=0x7fa189786000 length=544768): mr=0x20c9cc0 took 0.145 msec
[1643916642.893375] [vulcan02:17018:0]           ib_md.c:788  UCX  TRACE registered memory 0x7fa189786000..0x7fa18980b000 on mlx5_0 lkey 0x1ecb98 rkey 0x1ecb98 access 0xf flags 0x3e4
[1643916642.893378] [vulcan02:17018:0]          rcache.c:955  UCX  TRACE mlx5_0: created region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893380] [vulcan02:17018:0]           mpool.c:237  UCX  DEBUG mpool ud_recv_skb: allocated chunk 0x7fa189786018 of 544744 bytes with 128 elements
[1643916642.893389] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168
[1643916642.893413] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80::9803:9b03:67:a59c to hash on device mlx5_0 port 1 index 0)
[1643916642.893426] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 1)
[1643916642.893437] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 2)
[1643916642.893448] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 3)
[1643916642.893459] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 4)
[1643916642.893470] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 5)
[1643916642.893480] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 6)
[1643916642.893491] [vulcan02:17018:0]        ud_iface.c:421  UCX  DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 7)
[1643916642.893632] [vulcan02:17018:0]         ib_mlx5.c:889  UCX  DEBUG SL=0 (AR support - no) was selected on mlx5_0:1, SLs with AR support = { <none> }, SLs without AR support = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }
[1643916642.893734] [vulcan02:17018:0]     timer_wheel.c:41   UCX  DEBUG high res timer created log=23 resolution=3813.003636 usec wanted: 2500.000000 usec
#
#      capabilities:
#            bandwidth: 11794.23/ppn + 0.00 MB/sec
#              latency: 630 nsec
#             overhead: 80 nsec
#             am_short: <= 180
#             am_bcopy: <= 4088
#             am_zcopy: <= 4088, up to 3 iov
#   am_opt_zcopy_align: <= 512
#         am_align_mtu: <= 4K
#            am header: <= 132
#           connection: to ep, to iface
#      device priority: 50
#     device num paths: 1
#              max eps: inf
#       device address: 3 bytes
#        iface address: 3 bytes
#           ep address: 6 bytes
#       error handling: peer failure, ep_check
[1643916642.893764] [vulcan02:17018:0]        ud_iface.c:638  UCX  DEBUG iface(0x2080fc0): cep cleanup
[1643916642.893766] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool ud_tx_skb destroyed
[1643916642.893770] [vulcan02:17018:0]          rcache.c:337  UCX  TRACE mlx5_0: lru add region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893772] [vulcan02:17018:0]          rcache.c:423  UCX  TRACE mlx5_0: put region, flags 0x1 region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893791] [vulcan02:17018:0]          rcache.c:462  UCX  TRACE mlx5_0: invalidate region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 1 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893813] [vulcan02:17018:0]          rcache.c:423  UCX  TRACE mlx5_0: put region, flags 0xa region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 1 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893815] [vulcan02:17018:0]          rcache.c:436  UCX  TRACE mlx5_0: put on GC list, flags 0xa region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893840] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool ud_recv_skb destroyed
[1643916642.894331] [vulcan02:17018:0]        ud_iface.c:645  UCX  DEBUG iface(0x2080fc0): ptr_array cleanup
#
[1643916642.894717] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool devx dbrec destroyed
[1643916642.894734] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x2082090 [id=8 ref 1] ucs_rcache_invalidate_handler() from hash
[1643916642.894737] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x2082090 [id=8 ref 1] ucs_rcache_invalidate_handler()
[1643916642.894742] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x2082090 [id=8 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643916642.894744] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x2082090 [id=8 ref 0] ucs_rcache_invalidate_handler()
[1643916642.894753] [vulcan02:17018:0]          rcache.c:379  UCX  TRACE mlx5_0: destroy region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.894756] [vulcan02:17018:0]           ib_md.c:558  UCX  TRACE ibv_dereg_mr(mr=0x20c9cc0 addr=0x7fa189786000 length=544768)
[1643916642.894873] [vulcan02:17018:0]          rcache.c:350  UCX  TRACE mlx5_0: lru remove region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.894979] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rcache_mp destroyed
[1643916642.895086] [vulcan02:17018:0]       ib_device.c:686  UCX  DEBUG destroying ib device mlx5_0
[1643916642.895090] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x2075b10 [id=4 ref 1] uct_ib_async_event_handler() from hash
[1643916642.895092] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x2075b10 [id=4 ref 1] uct_ib_async_event_handler()
[1643916642.895200] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x2075b10 [id=4 ref 1] uct_ib_async_event_handler() completion (called=0)
[1643916642.895202] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x2075b10 [id=4 ref 0] uct_ib_async_event_handler()
[1643916642.895631] [vulcan02:17018:0]           ib_md.c:1570 UCX  TRACE opening IB device mlx5_1
[1643916642.899069] [vulcan02:17018:0]       ib_device.c:554  UCX  DEBUG PF: mlx5_1 vendor_id: 0x15b3 device_id: 4123
[1643916642.899274] [vulcan02:17018:0]    ib_mlx5dv_md.c:491  UCX  DEBUG mlx5_1: disable ODP because it's not supported for DevX QP
[1643916642.899431] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x2076680 [id=4 ref 1] uct_ib_async_event_handler() to hash
[1643916642.899497] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 4 events 0x1 mode thread_spinlock
[1643916642.899501] [vulcan02:17018:0]       ib_device.c:668  UCX  DEBUG initialized device 'mlx5_1' (InfiniBand channel adapter) with 1 ports
[1643916642.899599] [vulcan02:17018:0]           ib_md.c:1675 UCX  DEBUG mlx5_1: cuda GPUDirect RDMA is enabled
[1643916642.899604] [vulcan02:17018:0]           ib_md.c:1675 UCX  DEBUG mlx5_1: rocm GPUDirect RDMA is disabled
[1643916642.899610] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1643916642.899619] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x20c45e0 [id=8 ref 1] ucs_rcache_invalidate_handler() to hash
[1643916642.899628] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 8 events 0x1 mode thread_spinlock
[1643916642.899706] [vulcan02:17018:0]           ib_md.c:1332 UCX  DEBUG mlx5_1: using registration cache
[1643916642.899722] [vulcan02:17018:0]           ib_md.c:1494 UCX  DEBUG failed to read file: /sys/class/infiniband/mlx5_1/device/current_link_width
[1643916642.899725] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40
[1643916642.899877] [vulcan02:17018:0]           ib_md.c:1623 UCX  DEBUG mlx5_1: md open by 'uct_ib_mlx5_devx_md_ops' is successful
[1643916642.900695] [vulcan02:17018:0]       ib_device.c:768  UCX  TRACE mlx5_1:1 is not active (state: 1)
[1643916642.900700] [vulcan02:17018:0]       ib_device.c:1171 UCX  TRACE mlx5_1:1 does not support flags 0x0: Destination is unreachable
[1643916642.900702] [vulcan02:17018:0]       ib_device.c:1185 UCX  DEBUG no compatible IB ports found for flags 0x0
[1643916642.900705] [vulcan02:17018:0]          uct_md.c:113  UCX  DEBUG failed to query rc_verbs resources: No such device
[1643916642.900707] [vulcan02:17018:0]       ib_device.c:768  UCX  TRACE mlx5_1:1 is not active (state: 1)
[1643916642.900709] [vulcan02:17018:0]       ib_device.c:1171 UCX  TRACE mlx5_1:1 does not support flags 0x4: Destination is unreachable
[1643916642.900711] [vulcan02:17018:0]       ib_device.c:1185 UCX  DEBUG no compatible IB ports found for flags 0x4
[1643916642.900713] [vulcan02:17018:0]          uct_md.c:113  UCX  DEBUG failed to query rc_mlx5 resources: No such device
[1643916642.900714] [vulcan02:17018:0]       ib_device.c:768  UCX  TRACE mlx5_1:1 is not active (state: 1)
[1643916642.900716] [vulcan02:17018:0]       ib_device.c:1171 UCX  TRACE mlx5_1:1 does not support flags 0xc4: Destination is unreachable
[1643916642.900718] [vulcan02:17018:0]       ib_device.c:1185 UCX  DEBUG no compatible IB ports found for flags 0xc4
[1643916642.900719] [vulcan02:17018:0]          uct_md.c:113  UCX  DEBUG failed to query dc_mlx5 resources: No such device
[1643916642.900721] [vulcan02:17018:0]       ib_device.c:768  UCX  TRACE mlx5_1:1 is not active (state: 1)
[1643916642.900723] [vulcan02:17018:0]       ib_device.c:1171 UCX  TRACE mlx5_1:1 does not support flags 0x0: Destination is unreachable
[1643916642.900724] [vulcan02:17018:0]       ib_device.c:1185 UCX  DEBUG no compatible IB ports found for flags 0x0
[1643916642.900726] [vulcan02:17018:0]          uct_md.c:113  UCX  DEBUG failed to query ud_verbs resources: No such device
[1643916642.900728] [vulcan02:17018:0]       ib_device.c:768  UCX  TRACE mlx5_1:1 is not active (state: 1)
[1643916642.900729] [vulcan02:17018:0]       ib_device.c:1171 UCX  TRACE mlx5_1:1 does not support flags 0x4: Destination is unreachable
[1643916642.900731] [vulcan02:17018:0]       ib_device.c:1185 UCX  DEBUG no compatible IB ports found for flags 0x4
[1643916642.900733] [vulcan02:17018:0]          uct_md.c:113  UCX  DEBUG failed to query ud_mlx5 resources: No such device
#
# Memory domain: mlx5_1
#     Component: ib
#             register: unlimited, cost: 180 nsec
#           remote key: 8 bytes
#           local memory handle is required for zcopy
#           memory invalidation is supported
#   < no supported devices found >
[1643916642.900827] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool devx dbrec destroyed
[1643916642.900836] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x20c45e0 [id=8 ref 1] ucs_rcache_invalidate_handler() from hash
[1643916642.900838] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x20c45e0 [id=8 ref 1] ucs_rcache_invalidate_handler()
[1643916642.900842] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x20c45e0 [id=8 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643916642.900844] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x20c45e0 [id=8 ref 0] ucs_rcache_invalidate_handler()
[1643916642.900863] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rcache_mp destroyed
[1643916642.900922] [vulcan02:17018:0]       ib_device.c:686  UCX  DEBUG destroying ib device mlx5_1
[1643916642.900925] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x2076680 [id=4 ref 1] uct_ib_async_event_handler() from hash
[1643916642.900927] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x2076680 [id=4 ref 1] uct_ib_async_event_handler()
[1643916642.901032] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x2076680 [id=4 ref 1] uct_ib_async_event_handler() completion (called=0)
[1643916642.901035] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x2076680 [id=4 ref 0] uct_ib_async_event_handler()
[1643916642.902601] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x2076680 [id=3 ref 1] uct_rdmacm_cm_event_handler() to hash
[1643916642.902681] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 3 events 0x1 mode thread_spinlock
[1643916642.902691] [vulcan02:17018:0]       rdmacm_cm.c:959  UCX  DEBUG created rdmacm_cm 0x20822b0 with event_channel 0x207f930 (fd=3)
#
# Connection manager: rdmacm
#      max_conn_priv: 54 bytes
[1643916642.902703] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x2076680 [id=3 ref 1] uct_rdmacm_cm_event_handler() from hash
[1643916642.902705] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x2076680 [id=3 ref 1] uct_rdmacm_cm_event_handler()
[1643916642.902749] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x2076680 [id=3 ref 1] uct_rdmacm_cm_event_handler() completion (called=0)
[1643916642.902752] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x2076680 [id=3 ref 0] uct_rdmacm_cm_event_handler()
[1643916642.902754] [vulcan02:17018:0]       rdmacm_cm.c:983  UCX  TRACE destroying event_channel 0x207f930 on cm 0x20822b0
[1643916642.902814] [vulcan02:17018:0]          cma_md.c:115  UCX  TRACE ptrace_scope is 0, CMA is supported
[1643916642.902823] [vulcan02:17018:0]          cma_md.c:115  UCX  TRACE ptrace_scope is 0, CMA is supported
#
# Memory domain: cma
#     Component: cma
#             register: unlimited, cost: 9 nsec
#
#      Transport: cma
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
[1643916642.902895] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool uct_scopy_iface_tx_mp: align 64, maxelems 4294967295, elemsize 736
#
#      capabilities:
#            bandwidth: 0.00/ppn + 11145.00 MB/sec
#              latency: 80 nsec
#             overhead: 2000 nsec
#            put_zcopy: unlimited, up to 16 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_zcopy: unlimited, up to 16 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 4 bytes
#       error handling: peer failure, ep_check
[1643916642.902917] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool uct_scopy_iface_tx_mp destroyed
#
[1643916642.903009] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1643916642.903018] [vulcan02:17018:0]           async.c:231  UCX  DEBUG added async handler 0x207eb50 [id=4 ref 1] ucs_rcache_invalidate_handler() to hash
[1643916642.903060] [vulcan02:17018:0]           async.c:509  UCX  DEBUG listening to async event fd 4 events 0x1 mode thread_spinlock
#
# Memory domain: knem
#     Component: knem
#             register: unlimited, cost: 180 nsec
#           remote key: 16 bytes
#
#      Transport: knem
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
[1643916642.903181] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool uct_scopy_iface_tx_mp: align 64, maxelems 4294967295, elemsize 736
#
#      capabilities:
#            bandwidth: 13862.00/ppn + 0.00 MB/sec
#              latency: 80 nsec
#             overhead: 2000 nsec
#            put_zcopy: unlimited, up to 16 iov
#  put_opt_zcopy_align: <= 1
#        put_align_mtu: <= 1
#            get_zcopy: unlimited, up to 16 iov
#  get_opt_zcopy_align: <= 1
#        get_align_mtu: <= 1
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 0 bytes
#       error handling: none
[1643916642.903212] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool uct_scopy_iface_tx_mp destroyed
#
[1643916642.903226] [vulcan02:17018:0]           async.c:156  UCX  DEBUG removed async handler 0x207eb50 [id=4 ref 1] ucs_rcache_invalidate_handler() from hash
[1643916642.903228] [vulcan02:17018:0]           async.c:562  UCX  DEBUG removing async handler 0x207eb50 [id=4 ref 1] ucs_rcache_invalidate_handler()
[1643916642.903284] [vulcan02:17018:0]           async.c:582  UCX  TRACE waiting for 0x207eb50 [id=4 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643916642.903286] [vulcan02:17018:0]           async.c:171  UCX  DEBUG release async handler 0x207eb50 [id=4 ref 0] ucs_rcache_invalidate_handler()
[1643916642.903291] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool rcache_mp destroyed
[1643916642.903349] [vulcan02:17018:0]        mm_xpmem.c:116  UCX  DEBUG xpmem version: 155653
[1643916642.903352] [vulcan02:17018:0]        mm_xpmem.c:116  UCX  DEBUG xpmem version: 155653
#
# Memory domain: xpmem
#     Component: xpmem
#             register: unlimited, cost: 60 nsec
#           remote key: 24 bytes
#           rkey_ptr is supported
#
#      Transport: xpmem
#         Device: memory
#           Type: intra-node
#  System device: <unknown>
[1643916642.903418] [vulcan02:17018:0]         uct_mem.c:106  UCX  TRACE allocating mm_recv_fifo: host memory length 8447 flags 0x3e0
[1643916642.903420] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method md
[1643916642.903422] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method mmap
[1643916642.903428] [vulcan02:17018:0]         uct_mem.c:304  UCX  TRACE   allocated 12288 bytes at 0x7fa189868000 using mmap
[1643916642.903448] [vulcan02:17018:0]           mpool.c:100  UCX  DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8288
[1643916642.903450] [vulcan02:17018:0]         uct_mem.c:106  UCX  TRACE allocating mm_recv_desc: host memory length 4259952 flags 0x3e0
[1643916642.903452] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method md
[1643916642.903453] [vulcan02:17018:0]         uct_mem.c:110  UCX  TRACE   trying allocation method mmap
[1643916642.903458] [vulcan02:17018:0]         uct_mem.c:304  UCX  TRACE   allocated 4263936 bytes at 0x7fa1821af000 using mmap
[1643916642.903464] [vulcan02:17018:0]           mpool.c:237  UCX  DEBUG mpool mm_recv_desc: allocated chunk 0x7fa1821af018 of 4263912 bytes with 512 elements
[1643916642.904243] [vulcan02:17018:0]        mm_iface.c:674  UCX  DEBUG created mm iface 0x20c8d90 FIFO id 0x7fa189868000 va 0x7fa189868000 size 12288 (128 x 64 elems)
#
#      capabilities:
[1643916642.904254] [vulcan02:17018:0]        mm_xpmem.c:116  UCX  DEBUG xpmem version: 155653
#            bandwidth: 0.00/ppn + 12179.00 MB/sec
#              latency: 80 nsec
#             overhead: 10 nsec
#            put_short: <= 4294967295
#            put_bcopy: unlimited
#            get_bcopy: unlimited
#             am_short: <= 100
#             am_bcopy: <= 8256
#               domain: cpu
#           atomic_add: 32, 64 bit
#           atomic_and: 32, 64 bit
#            atomic_or: 32, 64 bit
#           atomic_xor: 32, 64 bit
#          atomic_fadd: 32, 64 bit
#          atomic_fand: 32, 64 bit
#           atomic_for: 32, 64 bit
#          atomic_fxor: 32, 64 bit
#          atomic_swap: 32, 64 bit
#         atomic_cswap: 32, 64 bit
#           connection: to iface
#      device priority: 0
#     device num paths: 1
#              max eps: inf
#       device address: 8 bytes
#        iface address: 16 bytes
#       error handling: none
[1643916642.904382] [vulcan02:17018:0]           mpool.c:154  UCX  DEBUG mpool mm_recv_desc destroyed
#

petro-rudenko avatar Feb 03 '22 19:02 petro-rudenko

The issue is that ucx_info -d doesn't show cuda transport.

petro-rudenko avatar Feb 03 '22 19:02 petro-rudenko

@petro-rudenko Sorry for the delay. I seem to run into a build error with-go.

$ ./autogen.sh && cd build-own && echo $CUDA_HOME && ../contrib/configure-devel --enable-debug --enable-debug-data --with-java=no  --with-go --prefix=$PWD --with-cuda=$CUDA_HOME && make clean && make -j install

... 

make[3]: Entering directory '$UCX_HOME/build-own/bindings/go'
/usr/bin/install -c $UCX_HOME/build-own/bindings/go/.libs/tmp/goperftest $UCX_HOME/build-own/bin
/usr/bin/install: cannot stat '$UCX_HOME/build-own/bindings/go/.libs/tmp/goperftest': No such file or directory
Makefile:648: recipe for target 'install-exec-hook' failed
make[3]: *** [install-exec-hook] Error 1
make[3]: Leaving directory '$UCX_HOME/build-own/bindings/go'
Makefile:555: recipe for target 'install-exec-am' failed
make[2]: *** [install-exec-am] Error 2
make[2]: Leaving directory '$UCX_HOME/build-own/bindings/go'
Makefile:502: recipe for target 'install-am' failed
make[1]: *** [install-am] Error 2
make[1]: Leaving directory '$UCX_HOME/build-own/bindings/go'
Makefile:761: recipe for target 'install-recursive' failed
make: *** [install-recursive] Error 1

Akshay-Venkatesh avatar Feb 04 '22 00:02 Akshay-Venkatesh

@Akshay-Venkatesh can you rerun make distclean && make -j nproc && make install

petro-rudenko avatar Feb 04 '22 07:02 petro-rudenko

@petro-rudenko @yosefe goperftest issue should be resolved now. Is it possible restart tests?

Akshay-Venkatesh avatar Feb 05 '22 23:02 Akshay-Venkatesh

Hi @petro-rudenko

Looks like java and go tests are failing with the same type of error :

=== RUN   TestUcpMmap
[1644105177.739786] [swx-rdmz-ucx-gpu-02:1963 :1]    cuda_copy_md.c:164  UCX  ERROR   attempt to allocate cuda memory without active context
[1644105177.739799] [swx-rdmz-ucx-gpu-02:1963 :1]         uct_mem.c:157  UCX  ERROR   failed to allocate 1024 bytes using md cuda_cpy for user memory: No such device
    memory_test.go:120: Failed to allocate GPU memory <nil>

2022-02-05T23:53:18.6617474Z Running testActiveMessages with memType: 1
2022-02-05T23:53:18.9173634Z [1644105198.906710] [swx-rdmz-ucx-gpu-02:2015 :0]    cuda_copy_md.c:164  UCX  ERROR   attempt to allocate cuda memory without active context
2022-02-05T23:53:18.9181877Z [1644105198.906723] [swx-rdmz-ucx-gpu-02:2015 :0]         uct_mem.c:157  UCX  ERROR   failed to allocate 4096 bytes using md cuda_cpy for user memory: No such device

We had a similar issue in gtest path where an active context had to be setup in test/gtest/common/mem_buffer.cc and src/tools/perf/cuda/cuda_alloc.c.

#if HAVE_CUDA
     if (is_cuda_supported()) {
         cudaSetDevice(0);
         /* need to call free as context maybe lazily initialized when calling
          * cudaSetDevice(0) but calling cudaFree(0) should guarantee context
          * creation upon return */
         cudaFree(0);
     }
#endif

Is it possible to do something similar in the binding tests as well?

Also, I tried to find where gpu selection occurs in the bindings tests (as we do in perf/gtest by calling cudaSetDevice(0)) but I couldn't find it. How is this done?

Akshay-Venkatesh avatar Feb 06 '22 02:02 Akshay-Venkatesh

Here's a workaround we did in goperftest: https://github.com/openucx/ucx/blob/master/bindings/go/src/examples/perftest/perftest.go#L102

Probably would need to do something similar in java.

petro-rudenko avatar Feb 06 '22 19:02 petro-rudenko

Maybe we could do some checks in ucp/uct - since cudaSetDevice would require cuda dependencies for bindings. Or at least we would need to mention in ucp_mem_map API that if using memType cuda - need first to initialize cuda device

petro-rudenko avatar Feb 06 '22 19:02 petro-rudenko

I just stumbled upon this. The added flexibility of removing the CUDA toolkit as a dependency is indeed quite interesting, would be a pity seeing this PR stalling and not making it in future releases!

damianam avatar Mar 10 '22 14:03 damianam