Daniel Yu

Results 3 comments of Daniel Yu

@ttyio Thanks! And which should we put at the outermost dimension, `B` or `S`?

Turns out it only requires a CUDA 12.4 toolkit, driver version 535 is good for that kernel. Closing

I strongly recommend this PR to be merged since it resolved an important issue which might prevent shit code from being written: https://github.com/trekhleb/state-of-the-art-shitcode/issues/81