Daniel Arndt
Daniel Arndt
Let's see if people like us commit to this style.
After implementing `TeamPolicy` for `vector_size>1` (see https://github.com/kokkos/kokkos/pull/4183#discussion_r680212952), incremental tests 12a and 12b are deadlocking in the inner-most loop, i.e. ThreadVectorRange, in the subgroup barrier, when running with the `SYCL` `CUDA`...
As noticed in #5106, we don't use Kokkos::Tools::Experimental::device_id for all profiling events which means that, e.g., the number of the device isn't reported for `fence` events.
GitHub: | os | compiler | | ----- | ---- | |fedora:latest | gcc 12.1.1| |fedora:latest | clang 14.0.0| |fedora:rawhide | gcc 12.1.1| |fedora:rawhide | clang 14.0.5| |ubuntu:latest | gcc...
Fixes https://github.com/kokkos/kokkos-tools/issues/40. By default, `SpaceTimeStack` sets `USE_MPI=1` and assumes that the application uses MPI and initializes it. As also described in the link issue, this assumption is oftentimes not true....
Fixes #641. Only files in `src` are changed.
Mostly for CI for now.
The strategy here is to do a regular query that gets us the MPI ranks, and primitive indices for each query, send them back to the MPI rank owning the...