Evan Harvey
Evan Harvey
After more debugging I have determined that the misalignment is stemming from `Functor_BatchedSerialGemm` in Test_Batched_SerialInverseLU.hpp of an address outside the control of the parallel_for caller.
Given that the functor in question does not use any addresses that are violating 16-byte alignment nor do locals (`&_alpha` or `&_beta`) violate 16-byte alignment, I believe this is either...
Here are more triaging results. Note that local memory can only be allocated by the compiler. 1. Christian and I tried moving `Scalar _alpha, _beta` above the declaration of the...
The (register allocation bug?) still persists in cuda/12.2. KokkosKernels HEAD SHA: 6c06bd024bbcb48b1ca6bef165bd13e73a3c3b44 Kokkos HEAD SHA: 7e299b4e25c42528e105379c3aa9a318056545ba Local changes in KokkosKernels: [kk_local_changes.txt](https://github.com/kokkos/kokkos-kernels/files/12458552/kk_local_changes.txt) Local change in Kokkos: none. ``` module load sems-archive-env...
Unless there is a way to do this via a ini file, I suggest continuing to use `AT: WIP`. @lucbv: What's the concern with continuing to use `AT: WIP` for...
> The only concern is I forgot that WIP provides this and that's probably because it's not very explicit, we could create an alias in this case that does the...
@lucbv: this was discussed during the meeting last week. Would you please weigh in here?
@crtrott, @brian-kelley, @jgfouca: Can you suggest any proposed approaches or feedback?
@lucbv: this was discussed during the meeting last week. Would you please weigh in here?
Adding `AT: RETEST` since the failures are related to twostage GS on streams, not cluster GS. These twostage GS tests passed in https://github.com/kokkos/kokkos-kernels/pull/1980#issuecomment-1779711592.