CUDALibrarySamples Does cublasLt support rowwise/blockwise scaled Matmul?

I only see point-wise Matmul examples in the tutorial.

Mar 26 '25 15:03 ghostplant

Hi @ghostplant. cuBLAS does not support rowwise/blockwise yet. For the record, on Blackwell it supports block-scaling https://docs.nvidia.com/cuda/cublas/#d-block-scaling-for-fp8-and-fp4-data-types, which provide a similar level of accuracy.

Mar 26 '25 17:03 rsdubtso

Thanks, does Blackwell support FlashMLA and DeepGemm? I found plenty of dependencies incompatible with B200, so I have to use H100 if there is no similar solutions on Blackwell.

Mar 26 '25 22:03 ghostplant

My understanding is that DeepSeek did not implement FlashMLA and DeepGemm for Blackwell at least yet.

Mar 31 '25 21:03 rsdubtso

Sure, I'll use H100 instead.

Mar 31 '25 22:03 ghostplant