Christian Trott
Christian Trott
I think this is not an issue anymore gonna close this
i think this is fixed. Added test in #217
We should get rid of the carveout except for the case where occupancy limitation is requested, I will take a look at this again
Superseded by #5706
I see about 2-3% performance regression for LAMMPS with in.lj with size 4,2,2. This is on SKX with GCC 11.1 using 12 threads. 2.27s vs 2.32s. Ran test where I...
I did some more experiments and implemented a different locking scheme here. Looks like mutex lock implies stronger memory fencing than we need: https://github.com/Rombur/kokkos/pull/4, and if we relax that a...
Here is the Serial fix corresponding to this for the Intel ICE: https://github.com/kokkos/kokkos/pull/5671
Do you have this issue with nvcc? Or with nvcc and nvc++ as host compiler? The thing you are doing in the cmake will try to use nvc++ as a...
We could try `#pragma optimize("", off)`
SYCL didn't pass test with a compile error: ```c++ In file included from /var/jenkins/workspace/Kokkos/core/unit_test/headers_self_contained/tstHeader.cpp:26: In file included from /var/jenkins/workspace/Kokkos/algorithms/src/Kokkos_Sort.hpp:70: In file included from /opt/onedpl/include/oneapi/dpl/execution:61: In file included from /opt/onedpl/include/oneapi/dpl/pstl/algorithm_impl.h:24: In...