Daniel Arndt
Daniel Arndt
Copied from https://kokkosteam.slack.com/archives/G5CBLMFLP/p1707745107838919: > in the std-like parallel algorithms, when passing 2 views, we don't seem to actually check that they have the same size, we just take begin and...
Related to https://github.com/kokkos/kokkos/pull/5334#issuecomment-1787497628. #5334 exposed that `team_reduce` in `SYCL` doesn't use arbitrarily large types in general. The current implementation relies on using some fixed-size local memory (that is also used...
In reference to https://github.com/kokkos/kokkos/pull/6293#issue-1810833445. It seems that we don't test array reductions for TeamPolicy in the CI. This pull request copies a very similar test to use array reductions instead...
This pull request is based on #6562 and avoids using `m_team_reduce`. The advantage is that we don't need to explicitly allocate memory for `TeamPolicy` but only when needed. The problem...
The oneAPI compiler emits warnings whenever scratch memory is used since we don't expose what the correct address space is. For level 0, we essentially want `sycl::local_ptr` and for level...
https://github.com/dealii/dealii/issues/16065 shows that in a `deal.II` build with `Kokkos+Cuda` some `Kokkos` templates are the most expensive templates as reported by https://github.com/aras-p/ClangBuildAnalyzer. Specifically, these are `idForInstance` and the `Kokkos::View` used in...
Similar to #6562, unrolling the loops for shuffles improves performance significantly. For a simple dot product benchmark we are seeing on Intel PVC | elements | old shuffle (GB/s) |...
Using gcc-13 I am seeing warnings such as ``` [...] [ 27%] Building CXX object core/unit_test/CMakeFiles/Kokkos_CoreUnitTest_Serial2.dir/serial/TestSerial_SubView_c07.cpp.o In file included from /tmp/kokkos/core/src/Kokkos_View.hpp:490, from /tmp/kokkos/core/src/Kokkos_Parallel.hpp:31, from /tmp/kokkos/core/src/Kokkos_MemoryPool.hpp:26, from /tmp/kokkos/core/src/Kokkos_TaskScheduler.hpp:34, from /tmp/kokkos/core/src/Serial/Kokkos_Serial.hpp:36, from...
Ensure kernels submitted by multiple threads to synchronous execution spaces are enqueued correctly
Related to thread-safety questions in #6051 and #4385. This pull request ensures that kernels submitted to the same execution space instance from multiple threads don't run concurrently. Also, calling `fence`...
As noted in https://kokkosteam.slack.com/archives/C5BGU5NDQ/p1690480258950949?thread_ts=1690468200.131139&cid=C5BGU5NDQ, we currently don't support functors as reducers for nested reductions but for Range/MDRange/TeamPolicy reductiions. We should consider unifying the interfaces so that an example like ```C++...