Does cublasLt support rowwise/blockwise scaled Matmul?
I only see point-wise Matmul examples in the tutorial.
Hi @ghostplant. cuBLAS does not support rowwise/blockwise yet. For the record, on Blackwell it supports block-scaling https://docs.nvidia.com/cuda/cublas/#d-block-scaling-for-fp8-and-fp4-data-types, which provide a similar level of accuracy.
Thanks, does Blackwell support FlashMLA and DeepGemm? I found plenty of dependencies incompatible with B200, so I have to use H100 if there is no similar solutions on Blackwell.
My understanding is that DeepSeek did not implement FlashMLA and DeepGemm for Blackwell at least yet.
Sure, I'll use H100 instead.