FBGEMM icon indicating copy to clipboard operation
FBGEMM copied to clipboard

New DeepGemm Style Groupwise Kernel

Open jwfromm opened this issue 5 months ago • 2 comments

Summary: Initial enablement of CUTLASS' new groupwise scaling API for FP8 GEMM. This diff adds all the needed scaffolding and we confirm that the kernel runs and produces correct outputs, but I do not yet include tuning that would yield better performance. Interestingly, CUTLASS wants group/block scales in MN major format, while every other groupwise implementation I've seen uses K major. I add an option to our triton blockwise quantization kernels to support this layout.

In benchmarking the performance of those quantization kernels, I see that trition blockwise in general (with or without K major output) is quite slow. We may need to iterate on that if this becomes a commonly used kernel.

Differential Revision: D76830629

jwfromm avatar Jun 17 '25 17:06 jwfromm

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
Latest commit 47c135d23052c82fdbe7c06c1533f98925a1586f
Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/686ff413a14fd1000848503d
Deploy Preview https://deploy-preview-4365--pytorch-fbgemm-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Jun 17 '25 17:06 netlify[bot]

This pull request was exported from Phabricator. Differential Revision: D76830629

facebook-github-bot avatar Jun 17 '25 17:06 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D76830629

facebook-github-bot avatar Jul 09 '25 21:07 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D76830629

facebook-github-bot avatar Jul 10 '25 17:07 facebook-github-bot

This pull request has been merged in pytorch/FBGEMM@6bdbc78f361acdcd5467cfdb78fdb1b8588552b8.

facebook-github-bot avatar Jul 10 '25 22:07 facebook-github-bot