Daniel Arndt comments

Results 1005 comments of


                                            Daniel Arndt

Adds support for loop unrolling within range loops

In addition to the points raised above, `Unroll` seems like a misnomer to me here. What we are really doing is to give every thread more work items. For some...

Adds support for loop unrolling within range loops

> I am not sure if unroll is a misnomer. From my understanding compiler is unrolling the loop since it has access to loop length at compile time even without...

Adds support for loop unrolling within range loops

> These names are fine with me. I prefer `StaticBatchSize` but the others work too. What do you think @masterleinad ? Maybe. I'm just curious what you want to do...

Adds support for loop unrolling within range loops

> My thinking is we would still change the number of workers that are involved in the `parallel_for`, similar to the way we did in CUDA (Essentially changing the hardware...

Adds support for loop unrolling within range loops

> Regarding your second comment, what would be the workaround for the overflow error? Something like ```C++ for (Member i = 0; ((i < static_cast(work_stride * batch_size)) && (i <...

Specifying a specific type of host space in a DualView

https://github.com/kokkos/kokkos/issues/3044 is related.

Fix race conditions in HIP ParallelScan

Can you explain for every synchronization barrier added why they are necessary?

Fix race conditions in HIP ParallelScan

Since the HIP implementation for the most part is a copy of the Cuda implementation, I would expect that we need the same barriers for both backends.

Fix race conditions in HIP ParallelScan

```diff diff --git a/core/src/HIP/Kokkos_HIP_ParallelScan_Range.hpp b/core/src/HIP/Kokkos_HIP_ParallelScan_Range.hpp index ce9b35b0d..93f23dd4a 100644 --- a/core/src/HIP/Kokkos_HIP_ParallelScan_Range.hpp +++ b/core/src/HIP/Kokkos_HIP_ParallelScan_Range.hpp @@ -156,9 +156,6 @@ class ParallelScanHIPBase { iwork_base < range.end(); iwork_base += blockDim.y) { const typename Policy::member_type iwork...

Fix race conditions in HIP ParallelScan

Also note that the unit test doesn't use the result anyway.