ucx
ucx copied to clipboard
UCT/CUDA: remove cuda_runtime dependency
What
Removes uct/cuda dependency on cuda runtime
Why ?
- generally a minimum cuda driver version covers all functionality that cuda_runtime provides so additional dependency not needed
TODO
- need to check if this dependency can be removed from memory interception layer as well
- if all runtime memory calls necessarily go through driver API, it should be possible to remove ucm dependency on cudart as well
cc @yosefe @bureddy @jirikraus
per offline discussion, need to also remove it from build (link) otherwise ok
@yosefe the following changes to remove cudart from build causes gtest build to fail as it depends on cudart for cudaMalloc/cudaFree calls, and also depends on cudart_static for static hook tests.
diff --git a/config/m4/cuda.m4 b/config/m4/cuda.m4
index bd3308765..31bf3b7c7 100644
--- a/config/m4/cuda.m4
+++ b/config/m4/cuda.m4
@@ -48,9 +48,6 @@ AS_IF([test "x$cuda_checked" != "xyes"],
AS_IF([test "x$cuda_happy" = "xyes"],
[AC_CHECK_LIB([cuda], [cuDeviceGetUuid],
[CUDA_LIBS="$CUDA_LIBS -lcuda"], [cuda_happy="no"])])
- AS_IF([test "x$cuda_happy" = "xyes"],
- [AC_CHECK_LIB([cudart], [cudaGetDeviceCount],
- [CUDA_LIBS="$CUDA_LIBS -lcudart"], [cuda_happy="no"])])
# Check nvml header files
AC_CHECK_HEADERS([nvml.h],
@@ -68,15 +65,6 @@ AS_IF([test "x$cuda_checked" != "xyes"],
cuda_happy="no"])])
LDFLAGS="$save_LDFLAGS"
-
- # Check for cuda static library
- have_cuda_static="no"
- AS_IF([test "x$cuda_happy" = "xyes"],
- [AC_CHECK_LIB([cudart_static], [cudaGetDeviceCount],
- [CUDA_STATIC_LIBS="$CUDA_STATIC_LIBS -lcudart_static"
- have_cuda_static="yes"],
- [], [-ldl -lrt -lpthread])])
-
CPPFLAGS="$save_CPPFLAGS"
LDFLAGS="$save_LDFLAGS"
LIBS="$save_LIBS"
Before this PR, ldd libuct_cuda.so
looks as follows:
$ ldd lib/ucx/libuct_cuda.so
linux-vdso.so.1 (0x00007ffcc7ff2000)
libucs.so.0 => $UCX_HOME/lib/libucs.so.0 (0x00007feb00edd000)
libuct.so.0 => $UCX_HOME/lib/libuct.so.0 (0x00007feb00c81000)
libcuda.so.1 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcuda.so.1 (0x00007feaff55f000)
libcudart.so.11.0 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcudart.so.11.0 (0x00007feaff2c6000)
libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007feafec42000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007feafea23000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007feafe632000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007feafe42e000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007feafe223000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007feafde85000)
libucm.so.0 => $UCX_HOME/lib/libucm.so.0 (0x00007feafdc60000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007feafda58000)
/lib64/ld-linux-x86-64.so.2 (0x00007feb0137e000)
and after this PR, it looks as follows:
$ ldd lib/ucx/libuct_cuda.so
linux-vdso.so.1 (0x00007ffc0d48a000)
libucs.so.0 => $UCX_HOME/lib/libucs.so.0 (0x00007fd62246c000)
libuct.so.0 => $UCX_HOME/lib/libuct.so.0 (0x00007fd622210000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fd620b28000)
libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007fd6204a4000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd620285000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd61fe94000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd61fc90000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fd61fa85000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd61f6e7000)
libucm.so.0 => $UCX_HOME/lib/libucm.so.0 (0x00007fd61f4c2000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd61f2ba000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd62290c000)
Do we really need to remove anything?
can we remove cudart only from UCT&UCM. but not from gtest?
can we remove cudart only from UCT&UCM. but not from gtest?
@yosefe with this PR, that's already the case as UCT/UCM no longer depends on cudart:
$ ldd lib/ucx/libucm_cuda.so
linux-vdso.so.1 (0x00007fff6f472000)
libucm.so.0 => $UCX_HOME/lib/libucm.so.0 (0x00007f21c5ead000)
libcuda.so.1 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcuda.so.1 (0x00007f21c478b000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f21c456c000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f21c417b000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f21c3f77000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f21c3bd9000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f21c39d1000)
/lib64/ld-linux-x86-64.so.2 (0x00007f21c62da000)
$ ldd lib/ucx/libuct_cuda.so
linux-vdso.so.1 (0x00007fffec1db000)
libucs.so.0 => $UCX_HOME/lib/libucs.so.0 (0x00007fd62cbd5000)
libuct.so.0 => $UCX_HOME/lib/libuct.so.0 (0x00007fd62c979000)
libcuda.so.1 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcuda.so.1 (0x00007fd62b257000)
libnvidia-ml.so.1 => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007fd62abd3000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd62a9b4000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd62a5c3000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd62a3bf000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fd62a1b4000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd629e16000)
libucm.so.0 => $UCX_HOME/lib/libucm.so.0 (0x00007fd629bf1000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd6299e9000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd62d075000)
$ ldd test/gtest/gtest
linux-vdso.so.1 (0x00007ffd8b308000)
...
libcuda.so.1 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcuda.so.1 (0x00007f56b3283000)
libcudart.so.11.0 => /gpfs/fs1/SHARE/Utils/CUDA/11.3.0.0_465.19.01/lib64/libcudart.so.11.0 (0x00007f56b2fea000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f56b2dcb000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f56b2a42000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f56b26a4000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f56b2475000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f56b225d000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f56b1e6c000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f56b1c61000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f56b1a59000)
/lib64/ld-linux-x86-64.so.2 (0x00007f56b6fd1000)
libnl-route-3.so.200 => /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200 (0x00007f56b17e4000)
libnl-3.so.200 => /lib/x86_64-linux-gnu/libnl-3.so.200 (0x00007f56b15c4000)
libmlx5.so.1 => /usr/lib/x86_64-linux-gnu/libmlx5.so.1 (0x00007f56b136c000)
Are there further changes needed? I most likely misunderstood your question.
@Akshay-Venkatesh maybe this dependency is removed by the linker because we don't call it. But i think it would be good to remove it from Makefile as well - separate CUDA_LIBS to CUDA_LIBS and CUDART_LIBS
@petro-rudenko How do I know which compilation flags were used to build goperftest? I'm not sure if it was built with -DHAVE_CUDA or -DHAVE_CUDART here https://dev.azure.com/ucfconsort/ucx/_build/results?buildId=36177&view=logs&j=3326af28-725b-5a76-d9b2-a6afcb2c442d&t=325b69bb-50d4-51fd-759e-eb1ff0fb9743&l=370 I'm trying to figure what compilation flags were used to build the ucx version that was queried in functions like this as well as it looks like memtypesMask doesn't have CUDA for the failing test:
// This routine fetches information about the context.
func (c *UcpContext) Query(attrs ...UcpContextAttr) (*C.ucp_context_attr_t, error) {
var ucp_attrs C.ucp_context_attr_t
for _, attr := range attrs {
ucp_attrs.field_mask |= C.ulong(attr)
}
if status := C.ucp_context_query(c.context, &ucp_attrs); status != C.UCS_OK {
return nil, newUcxError(status)
}
return &ucp_attrs, nil
}
Hi @Akshay-Venkatesh Go dynamically links only to ucp and ucs. SInce it doesn't use cuda API directly - only through ucp_mem_map, etc:
https://github.com/openucx/ucx/blob/master/bindings/go/Makefile.am#L10-L11
So probably you would need to add to that file something like this:
if HAVE_CUDART
CGOCFLAGS=$(CGOCFLAGS) $(CUDART_CPPFLAGS)
CGOLDFLAGS=$(CGOLDFLAGS) $(CUDART_LDFLAGS)
UCX_SOPATH=$(UCX_SOPATH) $(CUDART_LIBS) -l $(top_builddir)/src/uct/cuda/libuct_cuda.la
endif
Hi @Akshay-Venkatesh Go dynamically links only to ucp and ucs. SInce it doesn't use cuda API directly - only through ucp_mem_map, etc:
https://github.com/openucx/ucx/blob/master/bindings/go/Makefile.am#L10-L11
Hi @petro-rudenko. Thanks for the info. I'm probably missing something but if the go test is checking perf with cuda memory, and if go dynamically links to ucx libraries already compiled with cuda enabled (so HAVE_CUDA and HAVE_CUDART set for appropriate compilation units), I'm not sure why memorytypesmask doesn't have CUDA memory in it. These tests were passing before so the failures are coming from changes in this PR. I'm probably missing some changes where HAVE_CUDA needs to be changed to have CUDART. I'll look into it.
BTW, I tried the change you suggested just to see if build passes but it seems like if we went with this approach, we'd have to also add this too right? (in case some other part depends on libcuda symbols)
if HAVE_CUDA
...
endif
$UCX_TLS=cuda UCX_LOG_LEVEL=trace LD_LIBRARY_PATH=/hpc/local/oss/gdrcopy2.3_cuda11.4/lib:/hpc/local/oss/cuda11.4/lib64:/hpc/local/oss/cuda11.4/lib64/stubs:/hpc/mtr_scrap/users/peterr/devel/u
cx-cuda-static/build/lib/:/hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/ /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/bindings/go/.libs/tmp/goperftest -m=cuda
[1643915511.186673] [vulcan02:15421:0] stats.c:861 UCX TRACE statistics disabled
[1643915511.186698] [vulcan02:15421:0] memtrack.c:409 UCX TRACE memtrack disabled
[1643915511.186716] [vulcan02:15421:0] debug.c:1211 UCX DEBUG using signal stack 0x7f19a963a000 size 141824
[1643915511.187337] [vulcan02:15421:0] init.c:116 UCX DEBUG /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/libucs.so.0 loaded at 0x7f19a8ca9000
[1643915511.187364] [vulcan02:15421:0] init.c:117 UCX DEBUG cmd line: /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/bindings/go/.libs/tmp/goperftest -m=cuda
[1643915511.187377] [vulcan02:15421:0] module.c:69 UCX DEBUG ucs library path: /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/libucs.so.0
[1643915511.187385] [vulcan02:15421:0] module.c:273 UCX DEBUG loading modules for ucs
[1643915511.190368] [vulcan02:15421:0] ucp_context.c:1776 UCX INFO UCP version is 1.13 (release 0)
[1643915511.191009] [vulcan02:15421:0] time.c:22 UCX DEBUG measured arch clock speed: 2200000000.00 Hz
[1643915511.191044] [vulcan02:15421:0] ucp_context.c:1564 UCX DEBUG estimated number of endpoints is 1
[1643915511.191048] [vulcan02:15421:0] ucp_context.c:1571 UCX DEBUG estimated number of endpoints per node is 1
[1643915511.191055] [vulcan02:15421:0] ucp_context.c:1578 UCX DEBUG estimated bcopy bandwidth is 6081740800.000000
[1643915511.191065] [vulcan02:15421:0] ucp_context.c:1644 UCX DEBUG allocation method[0] is md 'sysv'
[1643915511.191069] [vulcan02:15421:0] ucp_context.c:1644 UCX DEBUG allocation method[1] is md 'posix'
[1643915511.191075] [vulcan02:15421:0] ucp_context.c:1656 UCX DEBUG allocation method[2] is 'huge'
[1643915511.191078] [vulcan02:15421:0] ucp_context.c:1656 UCX DEBUG allocation method[3] is 'thp'
[1643915511.191081] [vulcan02:15421:0] ucp_context.c:1644 UCX DEBUG allocation method[4] is md '*'
[1643915511.191085] [vulcan02:15421:0] ucp_context.c:1656 UCX DEBUG allocation method[5] is 'mmap'
[1643915511.191088] [vulcan02:15421:0] ucp_context.c:1656 UCX DEBUG allocation method[6] is 'heap'
[1643915511.191106] [vulcan02:15421:0] module.c:273 UCX DEBUG loading modules for uct
[1643915511.191110] [vulcan02:15421:0] module.c:239 UCX TRACE loading module 'cuda' with mode 0x1
[1643915511.193231] [vulcan02:15421:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda.so.0.0.0 [0x2698720]
[1643915511.193247] [vulcan02:15421:0] module.c:189 UCX TRACE calling 'ucs_module_global_init' in '/hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda.so.0
.0.0': [0x7f197df916f1]
[1643915511.193253] [vulcan02:15421:0] module.c:273 UCX DEBUG loading modules for uct_cuda
[1643915511.193257] [vulcan02:15421:0] module.c:239 UCX TRACE loading module 'gdrcopy' with mode 0x1
[1643915511.194687] [vulcan02:15421:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda_gdrcopy.so.0.0.0 [0x2699e50]
[1643915511.194700] [vulcan02:15421:0] module.c:162 UCX DEBUG ignoring 'ucs_module_global_init' (0x7f197df916f1) from libuct_cuda.so.0 (0x7f197df8b000), expected in libuct_cuda_gd
rcopy.so.0 (7f197c540000)
[1643915511.194705] [vulcan02:15421:0] module.c:239 UCX TRACE loading module 'ib' with mode 0x1
[1643915511.196042] [vulcan02:15421:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_ib.so.0.0.0 [0x269ac00]
[1643915511.196058] [vulcan02:15421:0] module.c:239 UCX TRACE loading module 'rdmacm' with mode 0x1
[1643915511.196890] [vulcan02:15421:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_rdmacm.so.0.0.0 [0x269d9a0]
[1643915511.196904] [vulcan02:15421:0] module.c:239 UCX TRACE loading module 'cma' with mode 0x1
[1643915511.197524] [vulcan02:15421:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cma.so.0.0.0 [0x269e7d0]
[1643915511.197537] [vulcan02:15421:0] module.c:239 UCX TRACE loading module 'knem' with mode 0x1
[1643915511.198332] [vulcan02:15421:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_knem.so.0.0.0 [0x269ee70]
[1643915511.198353] [vulcan02:15421:0] module.c:239 UCX TRACE loading module 'xpmem' with mode 0x1
[1643915511.199171] [vulcan02:15421:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_xpmem.so.0.0.0 [0x269f570]
[1643915511.199268] [vulcan02:15421:0] module.c:273 UCX DEBUG loading modules for uct_ib
[1643915511.199577] [vulcan02:15421:0] cma_md.c:115 UCX TRACE ptrace_scope is 0, CMA is supported
[1643915511.199690] [vulcan02:15421:0] mm_xpmem.c:116 UCX DEBUG xpmem version: 155653
[1643915511.199822] [vulcan02:15421:0] ucp_context.c:908 UCX TRACE allowed transport 0 : 'cu
[1643915511.213509] [vulcan02:15421:0] ucp_context.c:683 UCX TRACE enabling tl 'ud_verbs' for alias 'ud_v' [53/491560]
[1643915511.213519] [vulcan02:15421:0] ucp_context.c:683 UCX TRACE enabling tl 'ud_verbs' for alias 'ud'
[1643915511.213540] [vulcan02:15421:0] ucp_context.c:690 UCX TRACE enabling auxiliary tl 'ud_verbs' for alias 'rc_v'
[1643915511.213544] [vulcan02:15421:0] ucp_context.c:690 UCX TRACE enabling auxiliary tl 'ud_verbs' for alias 'rc'
[1643915511.213567] [vulcan02:15421:0] ucp_context.c:820 UCX TRACE ud_verbs/mlx5_0:1 is disabled
[1643915511.213573] [vulcan02:15421:0] ucp_context.c:683 UCX TRACE enabling tl 'ud_mlx5' for alias 'ib'
[1643915511.213578] [vulcan02:15421:0] ucp_context.c:683 UCX TRACE enabling tl 'ud_mlx5' for alias 'ud_x'
[1643915511.213582] [vulcan02:15421:0] ucp_context.c:683 UCX TRACE enabling tl 'ud_mlx5' for alias 'ud'
[1643915511.213586] [vulcan02:15421:0] ucp_context.c:690 UCX TRACE enabling auxiliary tl 'ud_mlx5' for alias 'rc_x'
[1643915511.213591] [vulcan02:15421:0] ucp_context.c:690 UCX TRACE enabling auxiliary tl 'ud_mlx5' for alias 'rc'
[1643915511.213596] [vulcan02:15421:0] ucp_context.c:820 UCX TRACE ud_mlx5/mlx5_0:1 is disabled
[1643915511.213600] [vulcan02:15421:0] ucp_context.c:1306 UCX DEBUG closing md mlx5_0 because it has no selected transport resources
[1643915511.213714] [vulcan02:15421:0] mpool.c:154 UCX DEBUG mpool devx dbrec destroyed
[1643915511.213736] [vulcan02:15421:0] async.c:156 UCX DEBUG removed async handler 0x2697a50 [id=9 ref 1] ucs_rcache_invalidate_handler() from hash
[1643915511.213741] [vulcan02:15421:0] async.c:562 UCX DEBUG removing async handler 0x2697a50 [id=9 ref 1] ucs_rcache_invalidate_handler()
[1643915511.213751] [vulcan02:15421:0] async.c:582 UCX TRACE waiting for 0x2697a50 [id=9 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643915511.213756] [vulcan02:15421:0] async.c:171 UCX DEBUG release async handler 0x2697a50 [id=9 ref 0] ucs_rcache_invalidate_handler()
[1643915511.213773] [vulcan02:15421:0] mpool.c:154 UCX DEBUG mpool rcache_mp destroyed
[1643915511.213848] [vulcan02:15421:0] ib_device.c:686 UCX DEBUG destroying ib device mlx5_0
[1643915511.213868] [vulcan02:15421:0] async.c:156 UCX DEBUG removed async handler 0x26a2ec0 [id=5 ref 1] uct_ib_async_event_handler() from hash
[1643915511.213872] [vulcan02:15421:0] async.c:562 UCX DEBUG removing async handler 0x26a2ec0 [id=5 ref 1] uct_ib_async_event_handler()
[1643915511.214001] [vulcan02:15421:0] async.c:582 UCX TRACE waiting for 0x26a2ec0 [id=5 ref 1] uct_ib_async_event_handler() completion (called=0)
[1643915511.214008] [vulcan02:15421:0] async.c:171 UCX DEBUG release async handler 0x26a2ec0 [id=5 ref 0] uct_ib_async_event_handler()
[1643915511.214439] [vulcan02:15421:0] ib_md.c:1570 UCX TRACE opening IB device mlx5_1
[1643915511.217886] [vulcan02:15421:0] ib_device.c:554 UCX DEBUG PF: mlx5_1 vendor_id: 0x15b3 device_id: 4123
[1643915511.218102] [vulcan02:15421:0] ib_mlx5dv_md.c:491 UCX DEBUG mlx5_1: disable ODP because it's not supported for DevX QP
[1643915511.218268] [vulcan02:15421:0] async.c:231 UCX DEBUG added async handler 0x2697a00 [id=5 ref 1] uct_ib_async_event_handler() to hash
[1643915511.218330] [vulcan02:15421:0] async.c:509 UCX DEBUG listening to async event fd 5 events 0x1 mode thread_spinlock
[1643915511.218338] [vulcan02:15421:0] ib_device.c:668 UCX DEBUG initialized device 'mlx5_1' (InfiniBand channel adapter) with 1 ports
[1643915511.218431] [vulcan02:15421:0] ib_md.c:1675 UCX DEBUG mlx5_1: cuda GPUDirect RDMA is enabled
[1643915511.218442] [vulcan02:15421:0] ib_md.c:1675 UCX DEBUG mlx5_1: rocm GPUDirect RDMA is disabled
[1643915511.218450] [vulcan02:15421:0] mpool.c:100 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1643915511.218462] [vulcan02:15421:0] async.c:231 UCX DEBUG added async handler 0x26a0a70 [id=9 ref 1] ucs_rcache_invalidate_handler() to hash
[1643915511.218473] [vulcan02:15421:0] async.c:509 UCX DEBUG listening to async event fd 9 events 0x1 mode thread_spinlock
[1643915511.218564] [vulcan02:15421:0] ib_md.c:1332 UCX DEBUG mlx5_1: using registration cache
[1643915511.218586] [vulcan02:15421:0] ib_md.c:1494 UCX DEBUG failed to read file: /sys/class/infiniband/mlx5_1/device/current_link_width
[1643915511.218592] [vulcan02:15421:0] mpool.c:100 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40
[1643915511.218785] [vulcan02:15421:0] ib_md.c:1623 UCX DEBUG mlx5_1: md open by 'uct_ib_mlx5_devx_md_ops' is successful
[1643915511.219727] [vulcan02:15421:0] ib_device.c:768 UCX TRACE mlx5_1:1 is not active (state: 1)
[1643915511.219739] [vulcan02:15421:0] ib_device.c:1171 UCX TRACE mlx5_1:1 does not support flags 0x0: Destination is unreachable
[1643915511.219743] [vulcan02:15421:0] ib_device.c:1185 UCX DEBUG no compatible IB ports found for flags 0x0
[1643915511.219749] [vulcan02:15421:0] uct_md.c:113 UCX DEBUG failed to query rc_verbs resources: No such device
[1643915511.219753] [vulcan02:15421:0] ib_device.c:768 UCX TRACE mlx5_1:1 is not active (state: 1)
[1643915511.219757] [vulcan02:15421:0] ib_device.c:1171 UCX TRACE mlx5_1:1 does not support flags 0x4: Destination is unreachable
[1643915511.219761] [vulcan02:15421:0] ib_device.c:1185 UCX DEBUG no compatible IB ports found for flags 0x4
[1643915511.219764] [vulcan02:15421:0] uct_md.c:113 UCX DEBUG failed to query rc_mlx5 resources: No such device
[1643915511.219768] [vulcan02:15421:0] ib_device.c:768 UCX TRACE mlx5_1:1 is not active (state: 1)
[1643915511.219772] [vulcan02:15421:0] ib_device.c:1171 UCX TRACE mlx5_1:1 does not support flags 0xc4: Destination is unreachable
[1643915511.219776] [vulcan02:15421:0] ib_device.c:1185 UCX DEBUG no compatible IB ports found for flags 0xc4
[1643915511.219779] [vulcan02:15421:0] uct_md.c:113 UCX DEBUG failed to query dc_mlx5 resources: No such device
[1643915511.219783] [vulcan02:15421:0] ib_device.c:768 UCX TRACE mlx5_1:1 is not active (state: 1)
[1643915511.219787] [vulcan02:15421:0] ib_device.c:1171 UCX TRACE mlx5_1:1 does not support flags 0x0: Destination is unreachable
[1643915436.022653] [vulcan02:15318:0] ib_device.c:1185 UCX DEBUG no compatible IB ports found for flags 0x4
[1643915436.022657] [vulcan02:15318:0] uct_md.c:113 UCX DEBUG failed to query ud_mlx5 resources: No such device
[1643915436.022661] [vulcan02:15318:0] ucp_context.c:892 UCX DEBUG No tl resources found for md mlx5_1
[1643915436.022664] [vulcan02:15318:0] ucp_context.c:1306 UCX DEBUG closing md mlx5_1 because it has no selected transport resources
[1643915436.022759] [vulcan02:15318:0] mpool.c:154 UCX DEBUG mpool devx dbrec destroyed
[1643915436.022773] [vulcan02:15318:0] async.c:156 UCX DEBUG removed async handler 0x2769550 [id=9 ref 1] ucs_rcache_invalidate_handler() from hash
[1643915436.022778] [vulcan02:15318:0] async.c:562 UCX DEBUG removing async handler 0x2769550 [id=9 ref 1] ucs_rcache_invalidate_handler()
[1643915436.022784] [vulcan02:15318:0] async.c:582 UCX TRACE waiting for 0x2769550 [id=9 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643915436.022789] [vulcan02:15318:0] async.c:171 UCX DEBUG release async handler 0x2769550 [id=9 ref 0] ucs_rcache_invalidate_handler()
[1643915436.022800] [vulcan02:15318:0] mpool.c:154 UCX DEBUG mpool rcache_mp destroyed
[1643915436.022866] [vulcan02:15318:0] ib_device.c:686 UCX DEBUG destroying ib device mlx5_1
[1643915436.022874] [vulcan02:15318:0] async.c:156 UCX DEBUG removed async handler 0x276c4b0 [id=5 ref 1] uct_ib_async_event_handler() from hash
[1643915436.022878] [vulcan02:15318:0] async.c:562 UCX DEBUG removing async handler 0x276c4b0 [id=5 ref 1] uct_ib_async_event_handler()
[1643915436.022965] [vulcan02:15318:0] async.c:582 UCX TRACE waiting for 0x276c4b0 [id=5 ref 1] uct_ib_async_event_handler() completion (called=0)
[1643915436.022972] [vulcan02:15318:0] async.c:171 UCX DEBUG release async handler 0x276c4b0 [id=5 ref 0] uct_ib_async_event_handler()
[1643915436.023331] [vulcan02:15318:0] cma_md.c:115 UCX TRACE ptrace_scope is 0, CMA is supported
[1643915436.023374] [vulcan02:15318:0] ucp_context.c:908 UCX TRACE allowed transport 0 : 'cuda'
[1643915436.023382] [vulcan02:15318:0] ucp_context.c:683 UCX TRACE enabling tl 'cma' for alias 'sm'
[1643915436.023386] [vulcan02:15318:0] ucp_context.c:683 UCX TRACE enabling tl 'cma' for alias 'shm'
[1643915436.023393] [vulcan02:15318:0] ucp_context.c:820 UCX TRACE cma/memory is disabled
[1643915436.023397] [vulcan02:15318:0] ucp_context.c:1306 UCX DEBUG closing md cma because it has no selected transport resources
[1643915436.023474] [vulcan02:15318:0] mpool.c:100 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1643915436.023485] [vulcan02:15318:0] async.c:231 UCX DEBUG added async handler 0x2769f30 [id=5 ref 1] ucs_rcache_invalidate_handler() to hash
[1643915436.023562] [vulcan02:15318:0] async.c:509 UCX DEBUG listening to async event fd 5 events 0x1 mode thread_spinlock
[1643915436.023677] [vulcan02:15318:0] ucp_context.c:908 UCX TRACE allowed transport 0 : 'cuda'
[1643915436.023689] [vulcan02:15318:0] ucp_context.c:683 UCX TRACE enabling tl 'knem' for alias 'sm'
[1643915436.023693] [vulcan02:15318:0] ucp_context.c:683 UCX TRACE enabling tl 'knem' for alias 'shm'
[1643915436.023700] [vulcan02:15318:0] ucp_context.c:820 UCX TRACE knem/memory is disabled
[1643915436.023704] [vulcan02:15318:0] ucp_context.c:1306 UCX DEBUG closing md knem because it has no selected transport resources
[1643915436.023719] [vulcan02:15318:0] async.c:156 UCX DEBUG removed async handler 0x2769f30 [id=5 ref 1] ucs_rcache_invalidate_handler() from hash
[1643915436.023724] [vulcan02:15318:0] async.c:562 UCX DEBUG removing async handler 0x2769f30 [id=5 ref 1] ucs_rcache_invalidate_handler()
[1643915436.023811] [vulcan02:15318:0] async.c:582 UCX TRACE waiting for 0x2769f30 [id=5 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643915436.023819] [vulcan02:15318:0] async.c:171 UCX DEBUG release async handler 0x2769f30 [id=5 ref 0] ucs_rcache_invalidate_handler()
[1643915436.023827] [vulcan02:15318:0] mpool.c:154 UCX DEBUG mpool rcache_mp destroyed
[1643915436.023853] [vulcan02:15318:0] mm_xpmem.c:116 UCX DEBUG xpmem version: 155653
[1643915436.023893] [vulcan02:15318:0] ucp_context.c:908 UCX TRACE allowed transport 0 : 'cuda'
[1643915436.023900] [vulcan02:15318:0] ucp_context.c:683 UCX TRACE enabling tl 'xpmem' for alias 'mm'
[1643915436.023903] [vulcan02:15318:0] ucp_context.c:683 UCX TRACE enabling tl 'xpmem' for alias 'sm'
[1643915436.023907] [vulcan02:15318:0] ucp_context.c:683 UCX TRACE enabling tl 'xpmem' for alias 'shm'
[1643915436.023914] [vulcan02:15318:0] ucp_context.c:820 UCX TRACE xpmem/memory is disabled
[1643915436.023917] [vulcan02:15318:0] ucp_context.c:1306 UCX DEBUG closing md xpmem because it has no selected transport resources
[1643915436.023946] [vulcan02:15318:0] ucp_context.c:975 UCX WARN transport 'cuda' is not available, please use one or more of: cma, dc, dc_mlx5, dc_x, ib, knem, mm, posix, rc, rc_mlx5, rc_v, rc_verbs, rc_x, self, shm, sm, sysv, tcp, ud, ud_mlx5, ud_v, ud_verbs, ud_x, xpmem
[1643915436.023956] [vulcan02:15318:0] ucp_context.c:1230 UCX ERROR no usable transports/devices (asked cuda on all devices)
Strange: ucx_info -d | grep cuda
also empty.
https://github.com/openucx/ucx/blob/master/buildlib/pr/go/go-test.yml#L35-L45 - build like this.
$module show dev/cuda11.4
-------------------------------------------------------------------
/hpc/local/etc/modulefiles/dev/cuda11.4:
module-whatis add CUDA to your environment
setenv CUDA_HOME /hpc/local/oss/cuda11.4
prepend-path PATH /hpc/local/oss/cuda11.4/bin
prepend-path CPATH /hpc/local/oss/cuda11.4/include
prepend-path FPATH /hpc/local/oss/cuda11.4/include
prepend-path INCLUDE /hpc/local/oss/cuda11.4/include
prepend-path LIBRARY_PATH /hpc/local/oss/cuda11.4/lib64:/hpc/local/oss/cuda11.4/lib64/stubs
prepend-path LD_LIBRARY_PATH /hpc/local/oss/cuda11.4/lib64:/hpc/local/oss/cuda11.4/lib64/stubs
-------------------------------------------------------------------
@Akshay-Venkatesh may be the issue with out of source build. Try
mkdir build
cd build
../contrib/configure-devel --enable-debug --enable-debug-data --with-java=no --with-go --prefix=$PWD --with-cuda
make install
bin/ucx_info -d | grep cuda
seems cuda module is loaded correctly, maybe cuDeviceGetCount() returns 0?
UCX_LOG_LEVEL=trace LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib:/hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx bin/ucx_info -d
:
[1643916642.815533] [vulcan02:17018:0] stats.c:861 UCX TRACE statistics disabled
[1643916642.815554] [vulcan02:17018:0] memtrack.c:409 UCX TRACE memtrack disabled
[1643916642.815570] [vulcan02:17018:0] debug.c:1211 UCX DEBUG using signal stack 0x7fa189830000 size 141824
[1643916642.816208] [vulcan02:17018:0] init.c:116 UCX DEBUG /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/libucs.so.0 loaded at 0x7fa188c43000
[1643916642.816229] [vulcan02:17018:0] init.c:117 UCX DEBUG cmd line: bin/ucx_info -d
[1643916642.816239] [vulcan02:17018:0] module.c:69 UCX DEBUG ucs library path: /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/libucs.so.0
[1643916642.816246] [vulcan02:17018:0] module.c:273 UCX DEBUG loading modules for ucs
[1643916642.816353] [vulcan02:17018:0] module.c:273 UCX DEBUG loading modules for uct
[1643916642.816356] [vulcan02:17018:0] module.c:239 UCX TRACE loading module 'cuda' with mode 0x1
[1643916642.818109] [vulcan02:17018:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda.so.0.0.0 [0x2076be0]
[1643916642.818116] [vulcan02:17018:0] module.c:189 UCX TRACE calling 'ucs_module_global_init' in '/hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda.so.0.0.0': [0x7fa1874db6f1]
[1643916642.818119] [vulcan02:17018:0] module.c:273 UCX DEBUG loading modules for uct_cuda
[1643916642.818121] [vulcan02:17018:0] module.c:239 UCX TRACE loading module 'gdrcopy' with mode 0x1
[1643916642.819847] [vulcan02:17018:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cuda_gdrcopy.so.0.0.0 [0x2078280]
[1643916642.819853] [vulcan02:17018:0] module.c:162 UCX DEBUG ignoring 'ucs_module_global_init' (0x7fa1874db6f1) from libuct_cuda.so.0 (0x7fa1874d5000), expected in libuct_cuda_gdrcopy.so.0 (7fa184aec000)
[1643916642.819856] [vulcan02:17018:0] module.c:239 UCX TRACE loading module 'ib' with mode 0x1
[1643916642.821606] [vulcan02:17018:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_ib.so.0.0.0 [0x2079030]
[1643916642.821613] [vulcan02:17018:0] module.c:239 UCX TRACE loading module 'rdmacm' with mode 0x1
[1643916642.822534] [vulcan02:17018:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_rdmacm.so.0.0.0 [0x207bdd0]
[1643916642.822541] [vulcan02:17018:0] module.c:239 UCX TRACE loading module 'cma' with mode 0x1
[1643916642.823883] [vulcan02:17018:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_cma.so.0.0.0 [0x207cc00]
[1643916642.823893] [vulcan02:17018:0] module.c:239 UCX TRACE loading module 'knem' with mode 0x1
[1643916642.824701] [vulcan02:17018:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_knem.so.0.0.0 [0x207d2a0]
[1643916642.824706] [vulcan02:17018:0] module.c:239 UCX TRACE loading module 'xpmem' with mode 0x1
[1643916642.825667] [vulcan02:17018:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libuct_xpmem.so.0.0.0 [0x207d9a0]
#
# Memory domain: posix
# Component: posix
# allocate: <= 132039668K
# remote key: 24 bytes
# rkey_ptr is supported
#
# Transport: posix
# Device: memory
# Type: intra-node
# System device: <unknown>
[1643916642.825882] [vulcan02:17018:0] uct_mem.c:106 UCX TRACE allocating mm_recv_fifo: host memory length 8447 flags 0x3e0
[1643916642.825885] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method md
[1643916642.826020] [vulcan02:17018:0] sys.c:653 UCX TRACE detected huge page size: 2097152
[1643916642.826028] [vulcan02:17018:0] mm_posix.c:531 UCX DEBUG allocated posix shared memory at 0x7fa189868000 length 12288
[1643916642.826032] [vulcan02:17018:0] uct_mem.c:304 UCX TRACE allocated 12288 bytes at 0x7fa189868000 using posix
[1643916642.826056] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8288
[1643916642.826063] [vulcan02:17018:0] uct_mem.c:106 UCX TRACE allocating mm_recv_desc: host memory length 4259952 flags 0x3e0
[1643916642.826065] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method md
[1643916642.827961] [vulcan02:17018:0] mm_posix.c:326 UCX DEBUG shared memory mmap(addr=(nil), length=6291456, flags= HUGETLB, fd=5) failed: Invalid argument
[1643916642.827968] [vulcan02:17018:0] mm_posix.c:531 UCX DEBUG allocated posix shared memory at 0x7fa182bb8000 length 4263936
[1643916642.827970] [vulcan02:17018:0] uct_mem.c:304 UCX TRACE allocated 4263936 bytes at 0x7fa182bb8000 using posix
[1643916642.827978] [vulcan02:17018:0] mpool.c:237 UCX DEBUG mpool mm_recv_desc: allocated chunk 0x7fa182bb8018 of 4263912 bytes with 512 elements
[1643916642.828441] [vulcan02:17018:0] mm_iface.c:674 UCX DEBUG created mm iface 0x2082a00 FIFO id 0xc0000000c000427a va 0x7fa189868000 size 12288 (128 x 64 elems)
#
# capabilities:
# bandwidth: 0.00/ppn + 12179.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 8 bytes
# error handling: ep_check
[1643916642.829082] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool mm_recv_desc destroyed
#
#
# Memory domain: sysv
# Component: sysv
# allocate: unlimited
# remote key: 12 bytes
# rkey_ptr is supported
#
# Transport: sysv
# Device: memory
# Type: intra-node
# System device: <unknown>
[1643916642.829191] [vulcan02:17018:0] uct_mem.c:106 UCX TRACE allocating mm_recv_fifo: host memory length 8447 flags 0x3e0
[1643916642.829194] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method md
[1643916642.829199] [vulcan02:17018:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 8447 bytes with hugetlb
[1643916642.829218] [vulcan02:17018:0] uct_mem.c:304 UCX TRACE allocated 12288 bytes at 0x7fa189868000 using sysv
[1643916642.829234] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8288
[1643916642.829236] [vulcan02:17018:0] uct_mem.c:106 UCX TRACE allocating mm_recv_desc: host memory length 4259952 flags 0x3e0
[1643916642.829238] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method md
[1643916642.829255] [vulcan02:17018:0] mm_sysv.c:94 UCX DEBUG mm failed to allocate 4259952 bytes with hugetlb
[1643916642.829264] [vulcan02:17018:0] uct_mem.c:304 UCX TRACE allocated 4263936 bytes at 0x7fa182bb8000 using sysv
[1643916642.829274] [vulcan02:17018:0] mpool.c:237 UCX DEBUG mpool mm_recv_desc: allocated chunk 0x7fa182bb8018 of 4263912 bytes with 512 elements
[1643916642.830101] [vulcan02:17018:0] mm_iface.c:674 UCX DEBUG created mm iface 0x2083060 FIFO id 0x3c768000 va 0x7fa189868000 size 12288 (128 x 64 elems)
#
# capabilities:
# bandwidth: 0.00/ppn + 12179.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 8 bytes
# error handling: ep_check
[1643916642.830434] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool mm_recv_desc destroyed
#
#
# Memory domain: self
# Component: self
# register: unlimited, cost: 0 nsec
# remote key: 0 bytes
#
# Transport: self
# Device: memory0
# Type: loopback
# System device: <unknown>
[1643916642.830525] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool self_msg_desc: align 64, maxelems 4294967295, elemsize 8200
[1643916642.830529] [vulcan02:17018:0] self.c:222 UCX DEBUG created self iface id 0xb191fcf4ddda9707 send_size 8192
#
# capabilities:
# bandwidth: 0.00/ppn + 6911.00 MB/sec
# latency: 0 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 8K
# am_bcopy: <= 8K
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 0 bytes
# iface address: 8 bytes
# error handling: ep_check
[1643916642.830548] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool self_msg_desc destroyed
#
#
# Memory domain: tcp
# Component: tcp
# register: unlimited, cost: 0 nsec
# remote key: 0 bytes
#
[1643916642.832428] [vulcan02:17018:0] time.c:22 UCX DEBUG measured arch clock speed: 2200000000.00 Hz
# Transport: tcp
# Device: enp4s0f0
# Type: network
# System device: <unknown>
[1643916642.832446] [vulcan02:17018:0] tcp_iface.c:587 UCX DEBUG using TCP port range: 0-0
[1643916642.832450] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool uct_tcp_iface_tx_buf_mp: align 64, maxelems 4294967295, elemsize 8205
[1643916642.832452] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool uct_tcp_iface_rx_buf_mp: align 64, maxelems 4294967295, elemsize 131090
[1643916642.834165] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() to hash
[1643916642.834245] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 4 events 0x5 mode thread_spinlock
[1643916642.834256] [vulcan02:17018:0] tcp_iface.c:537 UCX DEBUG tcp_iface 0x20830e0: listening for connections (fd=4) on 10.210.0.167:33267
#
# capabilities:
# bandwidth: 113.16/ppn + 0.00 MB/sec
# latency: 5776 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 0
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
[1643916642.834381] [vulcan02:17018:0] tcp_iface.c:823 UCX DEBUG tcp_iface 0x20830e0: destroying
[1643916642.834390] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() from hash
[1643916642.834392] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler()
[1643916642.834475] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() completion (called=0)
[1643916642.834478] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x207ef00 [id=4 ref 0] uct_tcp_iface_connect_handler()
[1643916642.834481] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool uct_tcp_iface_rx_buf_mp destroyed
[1643916642.834483] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool uct_tcp_iface_tx_buf_mp destroyed
#
# Transport: tcp
# Device: lo
# Type: network
# System device: <unknown>
[1643916642.834532] [vulcan02:17018:0] tcp_iface.c:587 UCX DEBUG using TCP port range: 0-0
[1643916642.834535] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool uct_tcp_iface_tx_buf_mp: align 64, maxelems 4294967295, elemsize 8205
[1643916642.834537] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool uct_tcp_iface_rx_buf_mp: align 64, maxelems 4294967295, elemsize 131090
[1643916642.834747] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() to hash
[1643916642.834798] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 4 events 0x5 mode thread_spinlock
[1643916642.834803] [vulcan02:17018:0] tcp_iface.c:537 UCX DEBUG tcp_iface 0x20830e0: listening for connections (fd=4) on 127.0.0.1:60704
#
# capabilities:
[1643916642.834822] [vulcan02:17018:0] sock.c:90 UCX DEBUG ioctl(req=35142, ifr_name=lo) failed: Operation not supported
[1643916642.834829] [vulcan02:17018:0] tcp_net.c:61 UCX DEBUG speed of lo is UNKNOWN, assuming 100 Mbps
# bandwidth: 11.91/ppn + 0.00 MB/sec
# latency: 10960 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 18 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
[1643916642.834903] [vulcan02:17018:0] tcp_iface.c:823 UCX DEBUG tcp_iface 0x20830e0: destroying
[1643916642.834906] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() from hash
[1643916642.834909] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler()
[1643916642.834960] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() completion (called=0)
[1643916642.834963] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x207ef00 [id=4 ref 0] uct_tcp_iface_connect_handler()
[1643916642.834966] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool uct_tcp_iface_rx_buf_mp destroyed
[1643916642.834967] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool uct_tcp_iface_tx_buf_mp destroyed
#
# Transport: tcp
# Device: ib0
# Type: network
# System device: <unknown>
[1643916642.835011] [vulcan02:17018:0] tcp_iface.c:587 UCX DEBUG using TCP port range: 0-0
[1643916642.835014] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool uct_tcp_iface_tx_buf_mp: align 64, maxelems 4294967295, elemsize 8205
[1643916642.835016] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool uct_tcp_iface_rx_buf_mp: align 64, maxelems 4294967295, elemsize 131090
[1643916642.835223] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() to hash
[1643916642.835270] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 4 events 0x5 mode thread_spinlock
[1643916642.835274] [vulcan02:17018:0] tcp_iface.c:537 UCX DEBUG tcp_iface 0x20830e0: listening for connections (fd=4) on 1.1.10.2:56988
#
# capabilities:
# bandwidth: 11142.51/ppn + 0.00 MB/sec
# latency: 5206 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
[1643916642.835718] [vulcan02:17018:0] tcp_iface.c:823 UCX DEBUG tcp_iface 0x20830e0: destroying
[1643916642.835734] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() from hash
[1643916642.835741] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler()
[1643916642.835806] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x207ef00 [id=4 ref 1] uct_tcp_iface_connect_handler() completion (called=0)
[1643916642.835809] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x207ef00 [id=4 ref 0] uct_tcp_iface_connect_handler()
[1643916642.835814] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool uct_tcp_iface_rx_buf_mp destroyed
[1643916642.835816] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool uct_tcp_iface_tx_buf_mp destroyed
#
[1643916642.835918] [vulcan02:17018:0] tcp_sockcm.c:221 UCX DEBUG created tcp_sockcm 0x2082160
#
# Connection manager: tcp
# max_conn_priv: 2064 bytes
[1643916642.835999] [vulcan02:17018:0] module.c:273 UCX DEBUG loading modules for uct_ib
[1643916642.836370] [vulcan02:17018:0] ib_md.c:1570 UCX TRACE opening IB device mlx5_0
[1643916642.839854] [vulcan02:17018:0] ib_device.c:554 UCX DEBUG PF: mlx5_0 vendor_id: 0x15b3 device_id: 4123
[1643916642.840074] [vulcan02:17018:0] ib_mlx5dv_md.c:491 UCX DEBUG mlx5_0: disable ODP because it's not supported for DevX QP
[1643916642.842874] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x2075b10 [id=4 ref 1] uct_ib_async_event_handler() to hash
[1643916642.842935] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 4 events 0x1 mode thread_spinlock
[1643916642.842949] [vulcan02:17018:0] ib_device.c:668 UCX DEBUG initialized device 'mlx5_0' (InfiniBand channel adapter) with 1 ports
[1643916642.843128] [vulcan02:17018:0] ib_md.c:1675 UCX DEBUG mlx5_0: cuda GPUDirect RDMA is enabled
[1643916642.843136] [vulcan02:17018:0] ib_md.c:1675 UCX DEBUG mlx5_0: rocm GPUDirect RDMA is disabled
[1643916642.843162] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1643916642.844683] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x2082090 [id=8 ref 1] ucs_rcache_invalidate_handler() to hash
[1643916642.844697] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 8 events 0x1 mode thread_spinlock
[1643916642.844811] [vulcan02:17018:0] module.c:273 UCX DEBUG loading modules for ucm
[1643916642.844831] [vulcan02:17018:0] module.c:239 UCX TRACE loading module 'cuda' with mode 0x1001
[1643916642.845699] [vulcan02:17018:0] module.c:180 UCX TRACE loaded /hpc/mtr_scrap/users/peterr/devel/ucx-cuda-static/build/lib/ucx/libucm_cuda.so.0.0.0 [0x20c46f0]
[1643916642.845855] [vulcan02:17018:0] ib_md.c:1332 UCX DEBUG mlx5_0: using registration cache
[1643916642.845891] [vulcan02:17018:0] ib_md.c:1494 UCX DEBUG failed to read file: /sys/class/infiniband/mlx5_0/device/current_link_width
[1643916642.845900] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40
[1643916642.846149] [vulcan02:17018:0] ib_md.c:1623 UCX DEBUG mlx5_0: md open by 'uct_ib_mlx5_devx_md_ops' is successful
[1643916642.847289] [vulcan02:17018:0] topo.c:141 UCX DEBUG added sys_dev 0 for bus id 02:00.0
[1643916642.847296] [vulcan02:17018:0] ib_device.c:1140 UCX DEBUG mlx5_0 bus id 0:2:0.0 sys_dev 0
[1643916642.847335] [vulcan02:17018:0] ib_device.c:1140 UCX DEBUG mlx5_0 bus id 0:2:0.0 sys_dev 0
[1643916642.847366] [vulcan02:17018:0] ib_device.c:1140 UCX DEBUG mlx5_0 bus id 0:2:0.0 sys_dev 0
[1643916642.847410] [vulcan02:17018:0] ib_device.c:1140 UCX DEBUG mlx5_0 bus id 0:2:0.0 sys_dev 0
[1643916642.847437] [vulcan02:17018:0] ib_device.c:1140 UCX DEBUG mlx5_0 bus id 0:2:0.0 sys_dev 0
#
# Memory domain: mlx5_0
# Component: ib
# register: unlimited, cost: 180 nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
# memory invalidation is supported
#
# Transport: rc_verbs
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (0)
[1643916642.847684] [vulcan02:17018:0] ib_iface.c:866 UCX DEBUG using pkey[0] 0xffff on mlx5_0:1/IB
[1643916642.848388] [vulcan02:17018:0] ib_iface.c:1473 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 12 hdr_ofs 11 data_sz 8256
[1643916642.848422] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8276
[1643916642.848425] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8336
[1643916642.848471] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 64
[1643916642.848848] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64
[1643916642.848854] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rc_verbs_short_desc: align 64, maxelems 4294967295, elemsize 208
[1643916642.849209] [vulcan02:17018:0] ib_iface.c:1008 UCX DEBUG iface=0x20c8f10: created RC QP 0x1a611 on mlx5_0:1 TX wr:409 sge:5 inl:124 resp:64 RX wr:0 sge:0 resp:64
#
# capabilities:
# bandwidth: 11794.23/ppn + 0.00 MB/sec
# latency: 600 + 1.000 * N nsec
# overhead: 75 nsec
# put_short: <= 124
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 5 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 5 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 123
# am_bcopy: <= 8255
# am_zcopy: <= 8255, up to 4 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 127
# domain: device
# atomic_add: 64 bit
# atomic_fadd: 64 bit
# atomic_cswap: 64 bit
# connection: to ep
# device priority: 50
# device num paths: 1
# max eps: 256
# device address: 3 bytes
# ep address: 5 bytes
# error handling: peer failure, ep_check
[1643916642.849762] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rc_verbs_short_desc destroyed
[1643916642.850174] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool send-ops-mpool destroyed
[1643916642.850177] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rc_send_desc destroyed
[1643916642.850179] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rc_recv_desc destroyed
[1643916642.850180] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool pending-ops destroyed
#
#
# Transport: rc_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (0)
[1643916642.850767] [vulcan02:17018:0] ib_iface.c:866 UCX DEBUG using pkey[0] 0xffff on mlx5_0:1/IB
[1643916642.850800] [vulcan02:17018:0] ib_device.c:1409 UCX DEBUG max IB CQE size is 128
[1643916642.851939] [vulcan02:17018:0] ib_iface.c:1473 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 12 hdr_ofs 10 data_sz 8256
[1643916642.851948] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8276
[1643916642.851951] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8336
[1643916642.852001] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 64
[1643916642.852363] [vulcan02:17018:0] mpool.c:237 UCX DEBUG mpool devx dbrec: allocated chunk 0x21ae010 of 8176 bytes with 127 elements
[1643916642.852570] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 64
[1643916642.852698] [vulcan02:17018:0] ib_mlx5.c:889 UCX DEBUG SL=0 (AR support - no) was selected on mlx5_0:1, SLs with AR support = { <none> }, SLs without AR support = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }
[1643916642.853296] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool mlx5_dm_desc: align 64, maxelems 1, elemsize 80
[1643916642.853304] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 88
[1643916642.856148] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x2081c00 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash
[1643916642.856162] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 11 events 0x1 mode thread_spinlock
#
# capabilities:
# bandwidth: 11794.23/ppn + 0.00 MB/sec
# latency: 600 + 1.000 * N nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 14 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 14 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 186
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to ep
# device priority: 50
# device num paths: 1
# max eps: 256
# device address: 3 bytes
# ep address: 7 bytes
# error handling: buffer (zcopy), remote access, peer failure, ep_check
[1643916642.856216] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x2081c00 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() from hash
[1643916642.856219] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x2081c00 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler()
[1643916642.856224] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x2081c00 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() completion (called=0)
[1643916642.856226] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x2081c00 [id=11 ref 0] uct_rc_mlx5_devx_iface_event_handler()
[1643916642.856232] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rc_mlx5_atomic_desc destroyed
[1643916642.856235] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool mlx5_dm_desc destroyed
[1643916642.856885] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool send-ops-mpool destroyed
[1643916642.856890] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rc_send_desc destroyed
[1643916642.856892] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rc_recv_desc destroyed
[1643916642.856894] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool pending-ops destroyed
#
#
# Transport: dc_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (0)
[1643916642.857682] [vulcan02:17018:0] ib_iface.c:866 UCX DEBUG using pkey[0] 0xffff on mlx5_0:1/IB
[1643916642.858804] [vulcan02:17018:0] ib_iface.c:1473 UCX DEBUG created uct_ib_iface_t headroom_ofs 12 payload_ofs 12 hdr_ofs 10 data_sz 8256
[1643916642.858829] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rc_recv_desc: align 64, maxelems 4294967295, elemsize 8276
[1643916642.858832] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rc_send_desc: align 64, maxelems 4294967295, elemsize 8336
[1643916642.858887] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool send-ops-mpool: align 64, maxelems 4294967295, elemsize 64
[1643916642.859376] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool pending-ops: align 1, maxelems 4294967295, elemsize 112
[1643916642.859477] [vulcan02:17018:0] ib_mlx5.c:889 UCX DEBUG SL=0 (AR support - no) was selected on mlx5_0:1, SLs with AR support = { <none> }, SLs without AR support = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }
[1643916642.860009] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool mlx5_dm_desc: align 64, maxelems 1, elemsize 80
[1643916642.860016] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rc_mlx5_atomic_desc: align 64, maxelems 4294967295, elemsize 88
[1643916642.860036] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x20d1f10 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() to hash
[1643916642.860054] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 11 events 0x1 mode thread_spinlock
[1643916642.860367] [vulcan02:17018:0] dc_mlx5.c:836 UCX DEBUG creating dci pool 0 with 8 QPs
[1643916642.864991] [vulcan02:17018:0] dc_mlx5.c:1386 UCX DEBUG dc iface 0x218c640: using 'dcs_quota' policy with 8 dcis and 4608 cqes, dct 0x1b74a
[1643916642.865015] [vulcan02:17018:0] uct_mem.c:106 UCX TRACE allocating rc_recv_desc: host memory length 37481712 flags 0x3e0
[1643916642.865018] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method huge
[1643916642.865039] [vulcan02:17018:0] uct_mem.c:283 UCX TRACE failed to allocate 37481712 bytes from hugetlb: Out of memory
[1643916642.865041] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method thp
[1643916642.865083] [vulcan02:17018:0] uct_mem.c:304 UCX TRACE allocated 37748736 bytes at 0x7fa180000000 using thp
[1643916642.865108] [vulcan02:17018:0] mpool.c:237 UCX DEBUG mpool rcache_mp: allocated chunk 0x7fa18980b008 of 151544 bytes with 1052 elements
[1643916642.877550] [vulcan02:17018:0] ib_md.c:545 UCX TRACE ibv_reg_mr(pd=0x20822b0 addr=0x7fa180000000 length=37748736): mr=0x20c9cc0 took 12.351 msec
[1643916642.877571] [vulcan02:17018:0] ib_md.c:788 UCX TRACE registered memory 0x7fa180000000..0x7fa182400000 on mlx5_0 lkey 0xf31f4 rkey 0xf31f4 access 0xf flags 0x3e4
[1643916642.877630] [vulcan02:17018:0] rcache.c:955 UCX TRACE mlx5_0: created region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] gt rw ref 2 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.877636] [vulcan02:17018:0] mpool.c:237 UCX DEBUG mpool rc_recv_desc: allocated chunk 0x7fa180000018 of 37748712 bytes with 4537 elements
[1643916642.878184] [vulcan02:17018:0] dc_mlx5.c:1402 UCX DEBUG created dc iface 0x218c640
#
# capabilities:
# bandwidth: 11794.23/ppn + 0.00 MB/sec
# latency: 660 nsec
# overhead: 40 nsec
# put_short: <= 2K
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 11 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 4K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 11 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 4K
# am_short: <= 2046
# am_bcopy: <= 8254
# am_zcopy: <= 8254, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 138
# domain: device
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 5 bytes
# error handling: buffer (zcopy), remote access, peer failure, ep_check
[1643916642.882469] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x20d1f10 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() from hash
[1643916642.882477] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x20d1f10 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler()
[1643916642.882489] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x20d1f10 [id=11 ref 1] uct_rc_mlx5_devx_iface_event_handler() completion (called=0)
[1643916642.882494] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x20d1f10 [id=11 ref 0] uct_rc_mlx5_devx_iface_event_handler()
[1643916642.882509] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rc_mlx5_atomic_desc destroyed
[1643916642.882522] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool mlx5_dm_desc destroyed
[1643916642.883304] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool send-ops-mpool destroyed
[1643916642.883310] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rc_send_desc destroyed
[1643916642.883344] [vulcan02:17018:0] rcache.c:337 UCX TRACE mlx5_0: lru add region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] gt rw ref 2 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.883353] [vulcan02:17018:0] rcache.c:423 UCX TRACE mlx5_0: put region, flags 0x1 region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] gt rw ref 2 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.883365] [vulcan02:17018:0] rcache.c:462 UCX TRACE mlx5_0: invalidate region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] gt rw ref 1 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.883379] [vulcan02:17018:0] rcache.c:423 UCX TRACE mlx5_0: put region, flags 0xa region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] g- rw ref 1 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.883386] [vulcan02:17018:0] rcache.c:436 UCX TRACE mlx5_0: put on GC list, flags 0xa region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] g- rw ref 0 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.883432] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rc_recv_desc destroyed
[1643916642.883437] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool pending-ops destroyed
#
#
# Transport: ud_verbs
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (0)
[1643916642.884151] [vulcan02:17018:0] ib_iface.c:866 UCX DEBUG using pkey[0] 0xffff on mlx5_0:1/IB
[1643916642.884754] [vulcan02:17018:0] ib_iface.c:1473 UCX DEBUG created uct_ib_iface_t headroom_ofs 88 payload_ofs 88 hdr_ofs 40 data_sz 4096
[1643916642.885182] [vulcan02:17018:0] ib_iface.c:1008 UCX DEBUG iface=0x2080fc0: created UD QP 0x1b753 on mlx5_0:1 TX wr:341 sge:6 inl:124 resp:0 RX wr:4096 sge:1 resp:0
[1643916642.885564] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4192
[1643916642.885570] [vulcan02:17018:0] uct_mem.c:106 UCX TRACE allocating ud_recv_skb: host memory length 540784 flags 0x3e0
[1643916642.885572] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method huge
[1643916642.885576] [vulcan02:17018:0] uct_mem.c:283 UCX TRACE failed to allocate 540784 bytes from hugetlb: User-defined limit was reached
[1643916642.885578] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method thp
[1643916642.885598] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method md
[1643916642.885605] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method mmap
[1643916642.885618] [vulcan02:17018:0] uct_mem.c:304 UCX TRACE allocated 544768 bytes at 0x7fa189786000 using mmap
[1643916642.885627] [vulcan02:17018:0] rcache.c:379 UCX TRACE mlx5_0: destroy region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] g- rw ref 0 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.885631] [vulcan02:17018:0] ib_md.c:558 UCX TRACE ibv_dereg_mr(mr=0x20c9cc0 addr=0x7fa180000000 length=37748736)
[1643916642.890270] [vulcan02:17018:0] rcache.c:350 UCX TRACE mlx5_0: lru remove region 0x20c9bc0 [0x7fa180000000..0x7fa182400000] g- rw ref 0 lkey 0xf31f4 rkey 0xf31f4 atomic_rkey 0xffffffff
[1643916642.890595] [vulcan02:17018:0] ib_md.c:545 UCX TRACE ibv_reg_mr(pd=0x20822b0 addr=0x7fa189786000 length=544768): mr=0x20c9cc0 took 0.312 msec
[1643916642.890598] [vulcan02:17018:0] ib_md.c:788 UCX TRACE registered memory 0x7fa189786000..0x7fa18980b000 on mlx5_0 lkey 0x265907 rkey 0x265907 access 0xf flags 0x3e4
[1643916642.890602] [vulcan02:17018:0] rcache.c:955 UCX TRACE mlx5_0: created region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.890604] [vulcan02:17018:0] mpool.c:237 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x7fa189786018 of 544744 bytes with 128 elements
[1643916642.890613] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168
[1643916642.890656] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80::9803:9b03:67:a59c to hash on device mlx5_0 port 1 index 0)
[1643916642.890677] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 1)
[1643916642.890690] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 2)
[1643916642.890701] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 3)
[1643916642.890713] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 4)
[1643916642.890725] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 5)
[1643916642.890737] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 6)
[1643916642.890749] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 7)
[1643916642.890920] [vulcan02:17018:0] timer_wheel.c:41 UCX DEBUG high res timer created log=23 resolution=3813.003636 usec wanted: 2500.000000 usec
#
# capabilities:
# bandwidth: 11794.23/ppn + 0.00 MB/sec
# latency: 630 nsec
# overhead: 105 nsec
# am_short: <= 116
# am_bcopy: <= 4088
# am_zcopy: <= 4088, up to 5 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 3952
# connection: to ep, to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
[1643916642.890959] [vulcan02:17018:0] ud_iface.c:638 UCX DEBUG iface(0x2080fc0): cep cleanup
[1643916642.890964] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool ud_tx_skb destroyed
[1643916642.890968] [vulcan02:17018:0] rcache.c:337 UCX TRACE mlx5_0: lru add region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.890971] [vulcan02:17018:0] rcache.c:423 UCX TRACE mlx5_0: put region, flags 0x1 region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.890979] [vulcan02:17018:0] rcache.c:462 UCX TRACE mlx5_0: invalidate region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 1 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.890985] [vulcan02:17018:0] rcache.c:423 UCX TRACE mlx5_0: put region, flags 0xa region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 1 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.890988] [vulcan02:17018:0] rcache.c:436 UCX TRACE mlx5_0: put on GC list, flags 0xa region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.891015] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool ud_recv_skb destroyed
[1643916642.891480] [vulcan02:17018:0] ud_iface.c:645 UCX DEBUG iface(0x2080fc0): ptr_array cleanup
#
#
# Transport: ud_mlx5
# Device: mlx5_0:1
# Type: network
# System device: mlx5_0 (0)
[1643916642.891836] [vulcan02:17018:0] ib_iface.c:866 UCX DEBUG using pkey[0] 0xffff on mlx5_0:1/IB
[1643916642.892390] [vulcan02:17018:0] ib_iface.c:1473 UCX DEBUG created uct_ib_iface_t headroom_ofs 88 payload_ofs 88 hdr_ofs 40 data_sz 4096
[1643916642.892747] [vulcan02:17018:0] ib_iface.c:1008 UCX DEBUG iface=0x2080fc0: created UD QP 0x1b754 on mlx5_0:1 TX wr:341 sge:6 inl:124 resp:0 RX wr:4096 sge:1 resp:0
[1643916642.892758] [vulcan02:17018:0] ib_mlx5.c:568 UCX DEBUG tx wq 65536 bytes [bb=64, nwqe=1024] mmio_mode bf_post
[1643916642.893100] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool ud_recv_skb: align 64, maxelems 4294967295, elemsize 4192
[1643916642.893103] [vulcan02:17018:0] uct_mem.c:106 UCX TRACE allocating ud_recv_skb: host memory length 540784 flags 0x3e0
[1643916642.893105] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method huge
[1643916642.893108] [vulcan02:17018:0] uct_mem.c:283 UCX TRACE failed to allocate 540784 bytes from hugetlb: User-defined limit was reached
[1643916642.893109] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method thp
[1643916642.893123] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method md
[1643916642.893129] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method mmap
[1643916642.893136] [vulcan02:17018:0] uct_mem.c:304 UCX TRACE allocated 544768 bytes at 0x7fa189786000 using mmap
[1643916642.893144] [vulcan02:17018:0] rcache.c:379 UCX TRACE mlx5_0: destroy region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.893146] [vulcan02:17018:0] ib_md.c:558 UCX TRACE ibv_dereg_mr(mr=0x20c9cc0 addr=0x7fa189786000 length=544768)
[1643916642.893216] [vulcan02:17018:0] rcache.c:350 UCX TRACE mlx5_0: lru remove region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x265907 rkey 0x265907 atomic_rkey 0xffffffff
[1643916642.893372] [vulcan02:17018:0] ib_md.c:545 UCX TRACE ibv_reg_mr(pd=0x20822b0 addr=0x7fa189786000 length=544768): mr=0x20c9cc0 took 0.145 msec
[1643916642.893375] [vulcan02:17018:0] ib_md.c:788 UCX TRACE registered memory 0x7fa189786000..0x7fa18980b000 on mlx5_0 lkey 0x1ecb98 rkey 0x1ecb98 access 0xf flags 0x3e4
[1643916642.893378] [vulcan02:17018:0] rcache.c:955 UCX TRACE mlx5_0: created region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893380] [vulcan02:17018:0] mpool.c:237 UCX DEBUG mpool ud_recv_skb: allocated chunk 0x7fa189786018 of 544744 bytes with 128 elements
[1643916642.893389] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool ud_tx_skb: align 64, maxelems 4294967295, elemsize 4168
[1643916642.893413] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80::9803:9b03:67:a59c to hash on device mlx5_0 port 1 index 0)
[1643916642.893426] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 1)
[1643916642.893437] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 2)
[1643916642.893448] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 3)
[1643916642.893459] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 4)
[1643916642.893470] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 5)
[1643916642.893480] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 6)
[1643916642.893491] [vulcan02:17018:0] ud_iface.c:421 UCX DEBUG iface 0x2080fc0: adding gid fe80:: to hash on device mlx5_0 port 1 index 7)
[1643916642.893632] [vulcan02:17018:0] ib_mlx5.c:889 UCX DEBUG SL=0 (AR support - no) was selected on mlx5_0:1, SLs with AR support = { <none> }, SLs without AR support = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }
[1643916642.893734] [vulcan02:17018:0] timer_wheel.c:41 UCX DEBUG high res timer created log=23 resolution=3813.003636 usec wanted: 2500.000000 usec
#
# capabilities:
# bandwidth: 11794.23/ppn + 0.00 MB/sec
# latency: 630 nsec
# overhead: 80 nsec
# am_short: <= 180
# am_bcopy: <= 4088
# am_zcopy: <= 4088, up to 3 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 132
# connection: to ep, to iface
# device priority: 50
# device num paths: 1
# max eps: inf
# device address: 3 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
[1643916642.893764] [vulcan02:17018:0] ud_iface.c:638 UCX DEBUG iface(0x2080fc0): cep cleanup
[1643916642.893766] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool ud_tx_skb destroyed
[1643916642.893770] [vulcan02:17018:0] rcache.c:337 UCX TRACE mlx5_0: lru add region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893772] [vulcan02:17018:0] rcache.c:423 UCX TRACE mlx5_0: put region, flags 0x1 region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 2 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893791] [vulcan02:17018:0] rcache.c:462 UCX TRACE mlx5_0: invalidate region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] gt rw ref 1 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893813] [vulcan02:17018:0] rcache.c:423 UCX TRACE mlx5_0: put region, flags 0xa region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 1 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893815] [vulcan02:17018:0] rcache.c:436 UCX TRACE mlx5_0: put on GC list, flags 0xa region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.893840] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool ud_recv_skb destroyed
[1643916642.894331] [vulcan02:17018:0] ud_iface.c:645 UCX DEBUG iface(0x2080fc0): ptr_array cleanup
#
[1643916642.894717] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool devx dbrec destroyed
[1643916642.894734] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x2082090 [id=8 ref 1] ucs_rcache_invalidate_handler() from hash
[1643916642.894737] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x2082090 [id=8 ref 1] ucs_rcache_invalidate_handler()
[1643916642.894742] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x2082090 [id=8 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643916642.894744] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x2082090 [id=8 ref 0] ucs_rcache_invalidate_handler()
[1643916642.894753] [vulcan02:17018:0] rcache.c:379 UCX TRACE mlx5_0: destroy region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.894756] [vulcan02:17018:0] ib_md.c:558 UCX TRACE ibv_dereg_mr(mr=0x20c9cc0 addr=0x7fa189786000 length=544768)
[1643916642.894873] [vulcan02:17018:0] rcache.c:350 UCX TRACE mlx5_0: lru remove region 0x20c9bc0 [0x7fa189786000..0x7fa18980b000] g- rw ref 0 lkey 0x1ecb98 rkey 0x1ecb98 atomic_rkey 0xffffffff
[1643916642.894979] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rcache_mp destroyed
[1643916642.895086] [vulcan02:17018:0] ib_device.c:686 UCX DEBUG destroying ib device mlx5_0
[1643916642.895090] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x2075b10 [id=4 ref 1] uct_ib_async_event_handler() from hash
[1643916642.895092] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x2075b10 [id=4 ref 1] uct_ib_async_event_handler()
[1643916642.895200] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x2075b10 [id=4 ref 1] uct_ib_async_event_handler() completion (called=0)
[1643916642.895202] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x2075b10 [id=4 ref 0] uct_ib_async_event_handler()
[1643916642.895631] [vulcan02:17018:0] ib_md.c:1570 UCX TRACE opening IB device mlx5_1
[1643916642.899069] [vulcan02:17018:0] ib_device.c:554 UCX DEBUG PF: mlx5_1 vendor_id: 0x15b3 device_id: 4123
[1643916642.899274] [vulcan02:17018:0] ib_mlx5dv_md.c:491 UCX DEBUG mlx5_1: disable ODP because it's not supported for DevX QP
[1643916642.899431] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x2076680 [id=4 ref 1] uct_ib_async_event_handler() to hash
[1643916642.899497] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 4 events 0x1 mode thread_spinlock
[1643916642.899501] [vulcan02:17018:0] ib_device.c:668 UCX DEBUG initialized device 'mlx5_1' (InfiniBand channel adapter) with 1 ports
[1643916642.899599] [vulcan02:17018:0] ib_md.c:1675 UCX DEBUG mlx5_1: cuda GPUDirect RDMA is enabled
[1643916642.899604] [vulcan02:17018:0] ib_md.c:1675 UCX DEBUG mlx5_1: rocm GPUDirect RDMA is disabled
[1643916642.899610] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1643916642.899619] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x20c45e0 [id=8 ref 1] ucs_rcache_invalidate_handler() to hash
[1643916642.899628] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 8 events 0x1 mode thread_spinlock
[1643916642.899706] [vulcan02:17018:0] ib_md.c:1332 UCX DEBUG mlx5_1: using registration cache
[1643916642.899722] [vulcan02:17018:0] ib_md.c:1494 UCX DEBUG failed to read file: /sys/class/infiniband/mlx5_1/device/current_link_width
[1643916642.899725] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool devx dbrec: align 64, maxelems 4294967295, elemsize 40
[1643916642.899877] [vulcan02:17018:0] ib_md.c:1623 UCX DEBUG mlx5_1: md open by 'uct_ib_mlx5_devx_md_ops' is successful
[1643916642.900695] [vulcan02:17018:0] ib_device.c:768 UCX TRACE mlx5_1:1 is not active (state: 1)
[1643916642.900700] [vulcan02:17018:0] ib_device.c:1171 UCX TRACE mlx5_1:1 does not support flags 0x0: Destination is unreachable
[1643916642.900702] [vulcan02:17018:0] ib_device.c:1185 UCX DEBUG no compatible IB ports found for flags 0x0
[1643916642.900705] [vulcan02:17018:0] uct_md.c:113 UCX DEBUG failed to query rc_verbs resources: No such device
[1643916642.900707] [vulcan02:17018:0] ib_device.c:768 UCX TRACE mlx5_1:1 is not active (state: 1)
[1643916642.900709] [vulcan02:17018:0] ib_device.c:1171 UCX TRACE mlx5_1:1 does not support flags 0x4: Destination is unreachable
[1643916642.900711] [vulcan02:17018:0] ib_device.c:1185 UCX DEBUG no compatible IB ports found for flags 0x4
[1643916642.900713] [vulcan02:17018:0] uct_md.c:113 UCX DEBUG failed to query rc_mlx5 resources: No such device
[1643916642.900714] [vulcan02:17018:0] ib_device.c:768 UCX TRACE mlx5_1:1 is not active (state: 1)
[1643916642.900716] [vulcan02:17018:0] ib_device.c:1171 UCX TRACE mlx5_1:1 does not support flags 0xc4: Destination is unreachable
[1643916642.900718] [vulcan02:17018:0] ib_device.c:1185 UCX DEBUG no compatible IB ports found for flags 0xc4
[1643916642.900719] [vulcan02:17018:0] uct_md.c:113 UCX DEBUG failed to query dc_mlx5 resources: No such device
[1643916642.900721] [vulcan02:17018:0] ib_device.c:768 UCX TRACE mlx5_1:1 is not active (state: 1)
[1643916642.900723] [vulcan02:17018:0] ib_device.c:1171 UCX TRACE mlx5_1:1 does not support flags 0x0: Destination is unreachable
[1643916642.900724] [vulcan02:17018:0] ib_device.c:1185 UCX DEBUG no compatible IB ports found for flags 0x0
[1643916642.900726] [vulcan02:17018:0] uct_md.c:113 UCX DEBUG failed to query ud_verbs resources: No such device
[1643916642.900728] [vulcan02:17018:0] ib_device.c:768 UCX TRACE mlx5_1:1 is not active (state: 1)
[1643916642.900729] [vulcan02:17018:0] ib_device.c:1171 UCX TRACE mlx5_1:1 does not support flags 0x4: Destination is unreachable
[1643916642.900731] [vulcan02:17018:0] ib_device.c:1185 UCX DEBUG no compatible IB ports found for flags 0x4
[1643916642.900733] [vulcan02:17018:0] uct_md.c:113 UCX DEBUG failed to query ud_mlx5 resources: No such device
#
# Memory domain: mlx5_1
# Component: ib
# register: unlimited, cost: 180 nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
# memory invalidation is supported
# < no supported devices found >
[1643916642.900827] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool devx dbrec destroyed
[1643916642.900836] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x20c45e0 [id=8 ref 1] ucs_rcache_invalidate_handler() from hash
[1643916642.900838] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x20c45e0 [id=8 ref 1] ucs_rcache_invalidate_handler()
[1643916642.900842] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x20c45e0 [id=8 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643916642.900844] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x20c45e0 [id=8 ref 0] ucs_rcache_invalidate_handler()
[1643916642.900863] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rcache_mp destroyed
[1643916642.900922] [vulcan02:17018:0] ib_device.c:686 UCX DEBUG destroying ib device mlx5_1
[1643916642.900925] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x2076680 [id=4 ref 1] uct_ib_async_event_handler() from hash
[1643916642.900927] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x2076680 [id=4 ref 1] uct_ib_async_event_handler()
[1643916642.901032] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x2076680 [id=4 ref 1] uct_ib_async_event_handler() completion (called=0)
[1643916642.901035] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x2076680 [id=4 ref 0] uct_ib_async_event_handler()
[1643916642.902601] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x2076680 [id=3 ref 1] uct_rdmacm_cm_event_handler() to hash
[1643916642.902681] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 3 events 0x1 mode thread_spinlock
[1643916642.902691] [vulcan02:17018:0] rdmacm_cm.c:959 UCX DEBUG created rdmacm_cm 0x20822b0 with event_channel 0x207f930 (fd=3)
#
# Connection manager: rdmacm
# max_conn_priv: 54 bytes
[1643916642.902703] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x2076680 [id=3 ref 1] uct_rdmacm_cm_event_handler() from hash
[1643916642.902705] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x2076680 [id=3 ref 1] uct_rdmacm_cm_event_handler()
[1643916642.902749] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x2076680 [id=3 ref 1] uct_rdmacm_cm_event_handler() completion (called=0)
[1643916642.902752] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x2076680 [id=3 ref 0] uct_rdmacm_cm_event_handler()
[1643916642.902754] [vulcan02:17018:0] rdmacm_cm.c:983 UCX TRACE destroying event_channel 0x207f930 on cm 0x20822b0
[1643916642.902814] [vulcan02:17018:0] cma_md.c:115 UCX TRACE ptrace_scope is 0, CMA is supported
[1643916642.902823] [vulcan02:17018:0] cma_md.c:115 UCX TRACE ptrace_scope is 0, CMA is supported
#
# Memory domain: cma
# Component: cma
# register: unlimited, cost: 9 nsec
#
# Transport: cma
# Device: memory
# Type: intra-node
# System device: <unknown>
[1643916642.902895] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool uct_scopy_iface_tx_mp: align 64, maxelems 4294967295, elemsize 736
#
# capabilities:
# bandwidth: 0.00/ppn + 11145.00 MB/sec
# latency: 80 nsec
# overhead: 2000 nsec
# put_zcopy: unlimited, up to 16 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 1
# get_zcopy: unlimited, up to 16 iov
# get_opt_zcopy_align: <= 1
# get_align_mtu: <= 1
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 4 bytes
# error handling: peer failure, ep_check
[1643916642.902917] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool uct_scopy_iface_tx_mp destroyed
#
[1643916642.903009] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool rcache_mp: align 8, maxelems 4294967295, elemsize 144
[1643916642.903018] [vulcan02:17018:0] async.c:231 UCX DEBUG added async handler 0x207eb50 [id=4 ref 1] ucs_rcache_invalidate_handler() to hash
[1643916642.903060] [vulcan02:17018:0] async.c:509 UCX DEBUG listening to async event fd 4 events 0x1 mode thread_spinlock
#
# Memory domain: knem
# Component: knem
# register: unlimited, cost: 180 nsec
# remote key: 16 bytes
#
# Transport: knem
# Device: memory
# Type: intra-node
# System device: <unknown>
[1643916642.903181] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool uct_scopy_iface_tx_mp: align 64, maxelems 4294967295, elemsize 736
#
# capabilities:
# bandwidth: 13862.00/ppn + 0.00 MB/sec
# latency: 80 nsec
# overhead: 2000 nsec
# put_zcopy: unlimited, up to 16 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 1
# get_zcopy: unlimited, up to 16 iov
# get_opt_zcopy_align: <= 1
# get_align_mtu: <= 1
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 0 bytes
# error handling: none
[1643916642.903212] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool uct_scopy_iface_tx_mp destroyed
#
[1643916642.903226] [vulcan02:17018:0] async.c:156 UCX DEBUG removed async handler 0x207eb50 [id=4 ref 1] ucs_rcache_invalidate_handler() from hash
[1643916642.903228] [vulcan02:17018:0] async.c:562 UCX DEBUG removing async handler 0x207eb50 [id=4 ref 1] ucs_rcache_invalidate_handler()
[1643916642.903284] [vulcan02:17018:0] async.c:582 UCX TRACE waiting for 0x207eb50 [id=4 ref 1] ucs_rcache_invalidate_handler() completion (called=0)
[1643916642.903286] [vulcan02:17018:0] async.c:171 UCX DEBUG release async handler 0x207eb50 [id=4 ref 0] ucs_rcache_invalidate_handler()
[1643916642.903291] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool rcache_mp destroyed
[1643916642.903349] [vulcan02:17018:0] mm_xpmem.c:116 UCX DEBUG xpmem version: 155653
[1643916642.903352] [vulcan02:17018:0] mm_xpmem.c:116 UCX DEBUG xpmem version: 155653
#
# Memory domain: xpmem
# Component: xpmem
# register: unlimited, cost: 60 nsec
# remote key: 24 bytes
# rkey_ptr is supported
#
# Transport: xpmem
# Device: memory
# Type: intra-node
# System device: <unknown>
[1643916642.903418] [vulcan02:17018:0] uct_mem.c:106 UCX TRACE allocating mm_recv_fifo: host memory length 8447 flags 0x3e0
[1643916642.903420] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method md
[1643916642.903422] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method mmap
[1643916642.903428] [vulcan02:17018:0] uct_mem.c:304 UCX TRACE allocated 12288 bytes at 0x7fa189868000 using mmap
[1643916642.903448] [vulcan02:17018:0] mpool.c:100 UCX DEBUG mpool mm_recv_desc: align 64, maxelems 4294967295, elemsize 8288
[1643916642.903450] [vulcan02:17018:0] uct_mem.c:106 UCX TRACE allocating mm_recv_desc: host memory length 4259952 flags 0x3e0
[1643916642.903452] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method md
[1643916642.903453] [vulcan02:17018:0] uct_mem.c:110 UCX TRACE trying allocation method mmap
[1643916642.903458] [vulcan02:17018:0] uct_mem.c:304 UCX TRACE allocated 4263936 bytes at 0x7fa1821af000 using mmap
[1643916642.903464] [vulcan02:17018:0] mpool.c:237 UCX DEBUG mpool mm_recv_desc: allocated chunk 0x7fa1821af018 of 4263912 bytes with 512 elements
[1643916642.904243] [vulcan02:17018:0] mm_iface.c:674 UCX DEBUG created mm iface 0x20c8d90 FIFO id 0x7fa189868000 va 0x7fa189868000 size 12288 (128 x 64 elems)
#
# capabilities:
[1643916642.904254] [vulcan02:17018:0] mm_xpmem.c:116 UCX DEBUG xpmem version: 155653
# bandwidth: 0.00/ppn + 12179.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 16 bytes
# error handling: none
[1643916642.904382] [vulcan02:17018:0] mpool.c:154 UCX DEBUG mpool mm_recv_desc destroyed
#
The issue is that ucx_info -d
doesn't show cuda transport.
@petro-rudenko Sorry for the delay. I seem to run into a build error with-go.
$ ./autogen.sh && cd build-own && echo $CUDA_HOME && ../contrib/configure-devel --enable-debug --enable-debug-data --with-java=no --with-go --prefix=$PWD --with-cuda=$CUDA_HOME && make clean && make -j install
...
make[3]: Entering directory '$UCX_HOME/build-own/bindings/go'
/usr/bin/install -c $UCX_HOME/build-own/bindings/go/.libs/tmp/goperftest $UCX_HOME/build-own/bin
/usr/bin/install: cannot stat '$UCX_HOME/build-own/bindings/go/.libs/tmp/goperftest': No such file or directory
Makefile:648: recipe for target 'install-exec-hook' failed
make[3]: *** [install-exec-hook] Error 1
make[3]: Leaving directory '$UCX_HOME/build-own/bindings/go'
Makefile:555: recipe for target 'install-exec-am' failed
make[2]: *** [install-exec-am] Error 2
make[2]: Leaving directory '$UCX_HOME/build-own/bindings/go'
Makefile:502: recipe for target 'install-am' failed
make[1]: *** [install-am] Error 2
make[1]: Leaving directory '$UCX_HOME/build-own/bindings/go'
Makefile:761: recipe for target 'install-recursive' failed
make: *** [install-recursive] Error 1
@Akshay-Venkatesh can you rerun make distclean && make -j
nproc && make install
@petro-rudenko @yosefe goperftest issue should be resolved now. Is it possible restart tests?
Hi @petro-rudenko
Looks like java and go tests are failing with the same type of error :
=== RUN TestUcpMmap
[1644105177.739786] [swx-rdmz-ucx-gpu-02:1963 :1] cuda_copy_md.c:164 UCX ERROR attempt to allocate cuda memory without active context
[1644105177.739799] [swx-rdmz-ucx-gpu-02:1963 :1] uct_mem.c:157 UCX ERROR failed to allocate 1024 bytes using md cuda_cpy for user memory: No such device
memory_test.go:120: Failed to allocate GPU memory <nil>
2022-02-05T23:53:18.6617474Z Running testActiveMessages with memType: 1
2022-02-05T23:53:18.9173634Z [1644105198.906710] [swx-rdmz-ucx-gpu-02:2015 :0] cuda_copy_md.c:164 UCX ERROR attempt to allocate cuda memory without active context
2022-02-05T23:53:18.9181877Z [1644105198.906723] [swx-rdmz-ucx-gpu-02:2015 :0] uct_mem.c:157 UCX ERROR failed to allocate 4096 bytes using md cuda_cpy for user memory: No such device
We had a similar issue in gtest path where an active context had to be setup in test/gtest/common/mem_buffer.cc and src/tools/perf/cuda/cuda_alloc.c.
#if HAVE_CUDA
if (is_cuda_supported()) {
cudaSetDevice(0);
/* need to call free as context maybe lazily initialized when calling
* cudaSetDevice(0) but calling cudaFree(0) should guarantee context
* creation upon return */
cudaFree(0);
}
#endif
Is it possible to do something similar in the binding tests as well?
Also, I tried to find where gpu selection occurs in the bindings tests (as we do in perf/gtest by calling cudaSetDevice(0)) but I couldn't find it. How is this done?
Here's a workaround we did in goperftest: https://github.com/openucx/ucx/blob/master/bindings/go/src/examples/perftest/perftest.go#L102
Probably would need to do something similar in java.
Maybe we could do some checks in ucp/uct - since cudaSetDevice
would require cuda dependencies for bindings. Or at least we would need to mention in ucp_mem_map
API that if using memType cuda - need first to initialize cuda device
I just stumbled upon this. The added flexibility of removing the CUDA toolkit as a dependency is indeed quite interesting, would be a pity seeing this PR stalling and not making it in future releases!