Adam C Reyes
Adam C Reyes
> This is cool! One concern I have about the simd case---are we sure it still effectively vectorizes? My experience in the past has been that it can be difficult...
What is the expected behavior for an `InnerLoopPatternSimdFor` nested in a device `par_for_outer` like done [here](https://github.com/parthenon-hpc-lab/parthenon/blob/9e2c7eafd85b8efb1ba40089a5b10da2c4f15e12/src/solvers/solver_utils.hpp#L250)? There is a comment ```cpp // Translate to a non-Kokkos plain C++ innermost loop...
> One argument for letting `DevExecSpace != HostExexSpace` is that users could use only `par_for` loops in a hierarchical setup like > > ```c++ > parthenon::par_for_outer( > outer_loop_pattern, "unit test...
I think this error: ``` 13: cudaFuncGetAttributes(&attr, func) error( cudaErrorInvalidDeviceFunction): 13: invalid device function /vast/home/lfroberts/parthenon/external/Kokkos/core/ 13: src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:140 ``` comes from compiling for the wrong architecture, at least thats what the...
It looks like the CI is using 11.6(?) and seems to work like you said. This seems consistent with this [nvidia thread](https://forums.developer.nvidia.com/t/cudaerrorinvaliddevicefunction/228883) that reports a similar issue that goes away...
I was able to recreate the issues with the `nvidia/cuda:11.4.3-devel-ubuntu20.04` docker container. There weren't any problems with the `par_dispatch` but instead it was a `KOKKOS_CLASS_LAMBDA` problem in the tests. All...
> Thanks @acreyes and @lroberts36 ! This is much easier to read than previously and a big improvement over previous functionality. > > I still have some readability concerns though......
> > I'm rerunning the Cuda test as it failed with some (unexpected) host to device mem copies. > > The test repeatedly failed. Any idea where those extra copies...
> I'd like to do some downstream performance testing early next week and understand/track down the additional host/device copies before I finally approve. 👍 I believe I've tracked down the...
> The merge from develop broke this PR. Currently trying to fix it Should be fixed now >@acreyes @fglines-nv let us know when this is ready for discussion again I...