ucx
ucx copied to clipboard
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
## What connect lanes in reverse ordering ## Why ? - to optimize multiple re-scheduling of requests with multi-fragment protocols - to avoid re-initializtion of started protocol according to #8386
## What Add dmabuf fd field in md_mem attributes ## Why ? Needed by UCT/IB to register device memory exposed as a dmabuf
## What Implementation of `ucp_proto_t::clean` for eager protocols Depends on https://github.com/openucx/ucx/pull/8386 ## Why ? The method was intoduced in https://github.com/openucx/ucx/pull/8386 and should be implemented for all protocols
Using clang 13, ucx 1.13.0-rc1 and CUDA 11.7 ``` $ grep 'implicit conversion' build.txt cuda_copy/cuda_copy_ep.c:120:31: error: implicit conversion from enumeration type 'CUresult' (aka 'enum cudaError_enum') to different enumeration type 'cudaError_t'...
### Describe the bug Segfaults/core dumps when running `ucx_info -d` ### Steps to Reproduce - ucx 1.13.0 release installation with `./configure --prefix=$HOME/bin/ucx-1.13.0` ``` ucx_info -v # Version 1.13.0 # Git...
### Describe the bug Was using 1.12.1 on an AMD x86 cluster, then switched to github head, programs working with IB transports (rc, dc) now hang. Single node runs with...
### Describe the bug I am running Openfoam-v2012 simulation (turbulent channel with particles) on cluster with 128 cores . The simulation works fine till 1 hour and then it shows...
### Describe the bug UCC mpitest fails to run on helios machine on hpcadvisorycouncil, unless flag is added -x UCX_IB_PREFER_NEAREST_DEVICE=n. Note that for example on Rome machine on hpcadvisorycouncil, mpitest...
## What Ignore path indexes when adding new lane and there is existing non-bw lane with the same tl resources ## Why ? The problem shown on the picture below...
### Describe the bug ROCm related unit test failed see [rocm.log](https://github.com/openucx/ucx/files/9076538/rocm.log) ### Setup and versions - CentOs stream 8 - rdma-core-55mlnx-37-1.55103.x86_64 ``` ibv_devinfo hca_id: mlx5_0 transport: InfiniBand (0) fw_ver: 20.32.1010...