Yanfei Guo issues

Results 21 issues of


                                            Yanfei Guo

hydra: GPU visibility control revamp

## Pull Request Description 1. Add interface for querying GPU device list and subdevice list in MPL. The MPL returns array of integers that represents individual GPU device or subdevice....

wrapper: delay recursion checking

## Pull Request Description The script will run recursively if no command line option is provides. Delaying the check remove a confusing warning message. ## Author Checklist * [ ]...

hydra: Add assignment option for GPU

We could do something similar like SLURM did with CUDA https://slurm.schedmd.com/gres.html#GPU_Management. Also need to investigate the assignment approach for AMD and Intel GPUs.

ch4/ucx: refactoring AM to use GenQ for pack buffer allocation

## Pull Request Description The goal of this PR is to use GenQ for pack buffer allocation. This avoids the costly allocation of GPU registered host buffer on the fly...

bug: ucx hang at finalize with 72 processes on a single node

The following failure is presented consistently for CH4-UCX build. The test is marked as xfail now. ```not ok 558 - ./datatype/darray_pack 72 --- Directory: ./datatype File: darray_pack Num-procs: 72 Timeout:...

mpl: fix configure for HIP detection

## Pull Request Description The pop on CPPFLAGS will cleanup the HIP related flags. This PR depends on fix at https://github.com/pmodels/yaksa/pull/231. ## Author Checklist * [ ] **Provide Description** Particularly...

ch4: Fix recv side free cell allocation

## Pull Request Description Receiver side free cell allocation does not working due to: 1. MPMC dequeue not exit on success 2. trying to using global rank of receiver at...

Adding 1.2 change list

## Pull Request Description ## Checklist * [ ] Reference appropriate issues (with "Fixes" or "See" as appropriate) * [ ] Commits are self-contained and do not do two things...

CUDA backend performance tuning

We need to investigate and study the best strategy for performance tuning in the CUDA backend. One knob is the thread block size vs number of blocks.

ch4/shm: Support topology-aware SHM communication

## Pull Request Description This PR adds the support of detecting node topology and runtime selection of regular/stream memcpy. The PR has four parts: 1. Fixing existing MPMC queue and...