Carl Pearson

Results 183 comments of Carl Pearson

I can reproduce the described behavior on Blake @ SNL: * H100 * g++ 11.5.0 * Debug build CUDA 12.2.2: both kernels are the same CUDA 12.6.2: Tag1 yields `189`...

I hate to say it but I'm leaning towards miscompilation. If I merely put `asm volatile ("");` before OR after (or both) the call to `neighbor_shift` in the `Tag2` guy...

> Have you tried un-nesting it? Please clarify; I am happy to try obviously-correct modifications to the test code.