Akshay-Venkatesh

Results 25 issues of Akshay-Venkatesh

Reduce_local implementation is missing which causes failures in IMB. The implementation piggybacks on existing cuda reduce implementation to stage/unstage send/receive buffers. bot:notacherrypick

Target: v4.1.x

## What Follow up to https://github.com/openucx/ucx/pull/9982. This PR caches the operation that imports remotely exported handle for a custom CUDA memory pool as the mapping operation via `cuMemPoolImportFromShareableHandle` is expensive.

## What When one of the devices passed to `ucs_topo_get_distance` is a GPU device, let NVML provide the estimation of latency and bandwidth between the GPU device and 1. another...

## Why ? Allow cumemcreate memory allocations to be registered with IB.

Port of https://github.com/open-mpi/ompi/pull/12835

Target: v5.0.x