FBGEMM Simplify grouped gemm output allocations

Summary: This very minor diff changes how output tensors in cutlass grouped gemm are allocated. Rather than treat them as a flat array, they now are allocated as standard matrices. This may help avoid an integer overflow error in the torch caching allocator.

Reviewed By: jiawenliu64

Differential Revision: D74852208

May 16 '25 00:05 jwfromm

Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
Latest commit	aa04e38e1c68501b5dd156b0c898246cb929b1e1
Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68268853fa1bf5000876cb98
Deploy Preview	https://deploy-preview-4134--pytorch-fbgemm-docs.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

May 16 '25 00:05 netlify[bot]

This pull request was exported from Phabricator. Differential Revision: D74852208

May 16 '25 00:05 facebook-github-bot

Simplify grouped gemm output allocations

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Deploy Preview for pytorch-fbgemm-docs ready!