dolfinx
dolfinx copied to clipboard
Make Scatterer "X"-aware
- Add option to use MPI_Isend/MPI_Irecv :
- It is not necessary to use host buffers and intermediate memory transfers with CUDA-aware MPI (if the right allocator is used).
- Neighbor collectives are not supported (yet?) in CUDA-aware MPI .
- Add allocator for indices so
pack
/unpack
kernels can be implemented on the targeted device.
Example (SYCL):
using DeviceScatterer = common::Scatterer<sycl::usm_allocator<std::int32_t, sycl::usm::alloc::shared>>;
auto queue = select_gpu_queue(comm);
sycl::usm_allocator<std::int32_t, sycl::usm::alloc::shared> allocator(*queue);
DeviceScatterer sct(map, bs, allocator);