Daniel Arndt

Results 1005 comments of Daniel Arndt

I'd like to make some progress here. The changes in the SYCL implementation fixed some test failures with SYCL on AMD GPUs.

HPSF CI in https://gitlab.spack.io/kokkos/kokkos/-/pipelines/1252975.

``` serial.view_64bit (60744 ms) cuda.view_64bit (733 ms) sycl.view_64bit (1872 ms) hip.view_64bit (5081 ms) ```

Since most of the changes are for SYCL, I'm fine with just changing that one build if we think it's too expensive otherwise.

> Don't we have the same issue with parallel_reduce and _scan? Yes, probably but I'd rather tackle that elsewhere.

> This change makes RandomAccessIterator very different from before: it will not work for non-strided layouts anymore and it will not reference count. We definitely can' just do this without...

> I assume this is impossible for modules with nvcc: [docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#module-support](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#module-support) So far, I'm, not aware of any GPU backend supporting C++20 modules.

```C++ #include int main(int argc, char *argv[]) { Kokkos::initialize(argc, argv); { int N = argc > 1 ? atoi(argv[1]) : 10000000; int V = argc > 2 ? atoi(argv[2]) :...

> You need to guard the pragma with `#ifdef KOKKOS_COMPILER_NVCC`. We also have `KOKKOS_ENABLE_PRAGMA_UNROLL` that tells you whether `pragma unroll` can be used.