Mikael Simberg
Mikael Simberg
Nowadays `stdexec` should be feature-complete for DLA-Future's use cases. We should test that it works equivalently and add CI configurations for: - [x] GCC ( #930) - [x] clang (#1024)...
The `with_temporary_tile` test is currently by far the largest test, at 175 MB (with CUDA): https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/4700071344751697/7514005670787789/-/jobs/5289553001#L3156. While not critical, it may be worth looking into whether it's possible to decrease...
For example: ``` [0] [0] 0.00350707s -204.11GFlop/s d (1024, 1024) (1024, 1024) 1024 (1, 1) 8 GPU [1] [1] 0.000240678s -2974.21GFlop/s d (1024, 1024) (1024, 1024) 1024 (1, 1) 8...
Use some type of semaphore to limit the number of algorithms that can be scheduled concurrently instead of "unrolling" the full pipeline in one go. This may improve memory locality...
See https://github.com/eth-cscs/DLA-Future/pull/898#discussion_r1238751326. In the worst case this may need support in `async_rw_mutex` in pika. Needs further investigation. Some investigation on where and if this actually could lead to a performance...
https://github.com/eth-cscs/DLA-Future/pull/908#discussion_r1234124259. Depends on #905.
E.g. like in https://github.com/eth-cscs/DLA-Future/pull/714#issuecomment-1310485151. Related to #712 and #714.
C.f. https://github.com/eth-cscs/DLA-Future/pull/834.
E.g. 15 or 16.
Umpire expects to find a GPU when it has been compiled with GPU support. If a GPU-enabled build of DLA-Future is used on a node without GPUs DLA-Future will fail...