Joseph Schuchart
Joseph Schuchart
First batch of changes from https://github.com/open-mpi/ompi/pull/12318 (offloading of reductions to devices). This PR adds stream operations to the accelerator components: - ~Default stream~ - Stream-based alloc and free - Stream-based...
This is the third chunk of #12318, which makes datatype copying stream-aware. This will be needed in the coll components to order data movement with kernel invocations. Adds `ompi_datatype_copy_content_same_ddt_stream` (and...
This is the second part #12318, which provides the device-side reduction operators and adds stream semantics to `ompi_op_reduce`. As usual, the operators are generated from macros. Function pointers to kernel...
This fixes a warning issues by Clang 15 about the `reqs` pointer being used potentially uninitialized in the error path, i.e., if the `cached_kmtree` is `NULL`.
I found that the persistent communication framework registers the progress callback of the selected component independent of whether persistent communication is actually used (https://github.com/open-mpi/ompi/blob/main/ompi/mca/part/base/part_base_select.c#L226). It's debatable whether that is necessary....
Current git main shows a bunch of warnings in oshmem, coming from the the `RUNTIME_CHECK_IMPL_RC` macro: ``` ../../../../oshmem/runtime/runtime.h:168:11: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned...
This is the fifth patch based on #12318 and introduces accelerator-awareness for the most of the allreduce implementations in `coll/base`. There were changes introduced to `ompi_coll_base_allreduce_intra_allgather_reduce`, which led to conflicts....
This is a the fourth chunk of #12318. We need the allocator to cache allocations on the device as allocation is too expensive to do on the fly in every...
## Description PaRSEC numbers its devices differently from the rest of world (starting with 0 for host, possibly 1 for recursive, 2... for devices). CUDA and HIP start numbering devices...
**Description** Trying to build SLATE on Frontier with the Cray compiler leads to this linking error: ``` [100%] Linking CXX executable testsweeper_tester cd /ccs/home/jschuchart/src/slate/slate/build/testsweeper/test && /autofs/nccs-svm1_sw/frontier/spack-envs/base/opt/linux-sles15-x86_64/gcc-7.5.0/cmake-3.23.2-4r4mpiba7cwdw2hlakh5i7tchi64s3qd/bin/cmake -E cmake_link_script CMakeFiles/testsweeper_tester.dir/link.txt --verbose=1...