Mikael Simberg
Mikael Simberg
Apparently cmake-format also comes with a cmake-lint command (https://cmake-format.readthedocs.io/en/latest/cmake-lint.html). We should try it and see if it's useful for improving our CMake scripts.
We should test if running the MPI/CUDA polling in standalone tasks works as well as integrating the polling in the scheduler. This would make the polling independent of the scheduler...
Currently only a single or zero values are allowed to be passed from the predecessor sender (https://github.com/pika-org/pika/blob/d48a0d5bd73eb75c44562c8f6ef9f201798d5780/libs/pika/async_cuda/include/pika/async_cuda/cuda_scheduler_bulk.hpp#L126-L129). I think this could easily be generalized by collecting arguments into a tuple,...
We currently use a spinlock pool for all `thread_data` locks. The default size of the pool is 128 locks. With large CPUs like the Grace 72-core CPUs, the default may...
Assuming there are no downsides, consider enabling it by default. May need some investigation of potential impact on performance.
We currently have wrappers for streams, handles, etc. but not one for events. We should add one.
Currently the stackful `thread_data` struct is very large and likely has room for optimization. Related: https://github.com/pika-org/pika/issues/304.
While for testing the current bitmask passed to `PIKA_MPI_COMPLETION_MODE` is convenient, the values are not very transparent to a user setting the values. It'd be nice to split up the...
We should set it to something that performs reasonably well, while providing a "safety" in terms of running continuations in new tasks etc.
Testing what effect this has on performance. The hope is to have less variance, but given that we don't pass/fail PRs based on these results, it may not be necessary...