ghostplant

Results 272 comments of ghostplant

@cpuhrsch One more question, if I want to use a different tile-size parameters for it (e.g. 64x64), do I still have no choice but to turn to third-party extensions? Whether...

Thanks, does Blackwell support FlashMLA and DeepGemm? I found plenty of dependencies incompatible with B200, so I have to use H100 if there is no similar solutions on Blackwell.

For rocBLAS, the case gets 0.009 ms.

@adityas-amd In large GEMM, ckProfiler is also slower than rocBLAS. We need ckProfiler mainly for some custom-purpose gemm fusion/quant that rocBLAS cannot achieve, regardless of large or small gemm. Can...

Thanks, however, I just recompile the latest CK, and it failed to complete building, so I cannot try your suggested fix. I have no idea if some recent commits break...

Hi, @bartekxk, may I know what argument value below stands for splitK=16? `./bin/ckProfiler gemm_universal 2 1 1 2 0 1 32 512 7168 -1 -1 -1 -1 3 100 0`...

@LyricZhao Thank you, for drop-less dispatch, will the utilization still be that fast when gating selection is imbalanced (e.g. all tokens routed to the same GPU)?

Just follow the instructions from https://github.com/microsoft/tutel/issues/248