Daniel Yu
Results
3
comments of
Daniel Yu
@ttyio Thanks! And which should we put at the outermost dimension, `B` or `S`?
Turns out it only requires a CUDA 12.4 toolkit, driver version 535 is good for that kernel. Closing
I strongly recommend this PR to be merged since it resolved an important issue which might prevent shit code from being written: https://github.com/trekhleb/state-of-the-art-shitcode/issues/81