Akshay-Venkatesh issues

Results 25 issues of


                                            Akshay-Venkatesh

UCT/API/V2: Introduce md_query_v2

## What In preparation for https://github.com/openucx/ucx/pull/7847 being broken into separate PRs, introduce md_query_v2 in this PR.

UCT/API: add dmabuf to md_mem_query attributes

## What Add dmabuf fd field in md_mem attributes ## Why ? Needed by UCT/IB to register device memory exposed as a dmabuf

API

UCS/TOPO: generate sys-dev index based on device entry position in sysfs

## What Use entry position of given device in `/sys/bus/pci/devices` instead of device iteration count as seen by topo sys on the given process ## Why ? Hopefully this ensures...

UCT/CUDA_COPY: detect device transfers and report peak arch bandwidth

## What Detect if remote/local memory types for perf estimate is of type cuda/cuda-managed. If so, report peak device memory bandwidth ## Why ? Preparation for device staging pipeline protocols....

UCT/CUDA: remove cuda_runtime dependency

## What Removes uct/cuda dependency on cuda runtime ## Why ? - generally a minimum cuda driver version covers all functionality that cuda_runtime provides so additional dependency not needed ###...

Fortran configure warnings and build errors with nvidia fortran compiler

Seeing the following compiler warnings with nvidia hpc sdk 22.2 available here https://developer.nvidia.com/nvidia-hpc-sdk-releases ## Background information ### What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch...

Target: v4.1.x

openmpi pml ucx cannot be selected when linking to cuda object file

### Describe the bug The full error message is: ``` $ # UCX at master $ mpirun -np 2 --npernode 2 --mca btl ^openib,smcuda --mca pml ucx --mca pml_ucx_devices any...

Bug

UCT/CUDA: set default max_reg_ratio to 1.0

## What Set default ratio to 1.0 which means that cuda pinned allocations of any size will be registered fully by IB. ## Why ? Pinned device memory is not...

UCT/CUDA_IPC: use rcache instead of pgtable to cache mappings

## Why ? Use native capabilities in rcache to limit the size of mappings that can be cached by cuda_ipc transport. For example, `UCX_CUDA_IPC_RCACHE_MAX_REGIONS=10 UCX_CUDA_IPC_RCACHE_MAX_SIZE=1mb` limits the maximum number of...

UCT/CUDA_COPY: add multi-device support in cuda_copy

## What/Why? Allow a single UCP context to handle multiple CUDA devices for cuda_copy transport. This enables use cases under Legion/Realm, OpenACC, and MPI workloads that prefer 1:N process-to-GPU mapping...