Mikael Simberg
Mikael Simberg
https://github.com/eth-cscs/DLA-Future/blob/6ac429e39214190ea9661bbc0877990023092160/src/eigensolver/tridiag_solver/kernels.cu#L638-L643 could potentially be done using a copy and then in-place partition instead. Investigate if it's faster than the current method. This is low priority unless we find evidence that...
Context in https://github.com/eth-cscs/DLA-Future/pull/423. Summary: that PR introduced a `threadmanager::wait` in certain locations to make sure that no work will be scheduled anymore before doing a blocking MPI call. We should...
#497 removed the MPI executor and replaced it with a custom sender adaptor. Ideally we should be using pika's `transform_mpi` which uses MPI polling.
Initially with clang, long-term this should also work with `nvcc`/`nvc++`.
Long-term goal, but I'm writing down my notes for it while I remember. mdspan (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p0009r17.html) and a standard blas interface (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1673r9.html) look very much like `Tile`s and the tile BLAS/LAPACK...
Not including the full `iostream` header in header files can reduce compile times. This is quite low priority.
This can help improve link times for users of DLA-Future. Low priority especially without measuring if link times are actually a problem.