carbotaniuman

Results 54 comments of carbotaniuman

I have a commit for this alongside other half changes in #1710.

I think it would be slightly better for memory access patterns if we did a shuffle instead of changing index, but that is probably worse if we need to do...

I've finished testing this on CUDA + host on both SSCP and SMCP, using the in-place changes. I've also chosen a new implementation option that hopefully is more maintainable and...