Akshay-Venkatesh issues

Results 25 issues of


                                            Akshay-Venkatesh

v4.1.x: ompi/coll/cuda: implement reduce_local

Reduce_local implementation is missing which causes failures in IMB. The implementation piggybacks on existing cuda reduce implementation to stage/unstage send/receive buffers. bot:notacherrypick

Target: v4.1.x

UCT/CUDA_IPC: Cache for mempool import operation

## What Follow up to https://github.com/openucx/ucx/pull/9982. This PR caches the operation that imports remotely exported handle for a custom CUDA memory pool as the mapping operation via `cuMemPoolImportFromShareableHandle` is expensive.

UCS/TOPO: NVML topology module

## What When one of the devices passed to `ucs_topo_get_distance` is a GPU device, let NVML provide the estimation of latency and bandwidth between the GPU device and 1. another...

UCT/CUDA_COPY: enable rdma flag for fabric allocations

## Why ? Allow cumemcreate memory allocations to be registered with IB.

5.0.x/opal/cuda: Handle stream-ordered allocations and assign primary device

Port of https://github.com/open-mpi/ompi/pull/12835

Target: v5.0.x