ao icon indicating copy to clipboard operation
ao copied to clipboard

Introduce new W8A8-FP-CSR quantitzation API

Open namgyu-youn opened this issue 1 month ago • 10 comments

Summary: Introduce new W8A8-FP-CSR quantization API, Float8SemiSparseTensor, which specializes in semi-sparse pattern using cuSPARSELt accelerations (https://docs.nvidia.com/cuda/cusparselt/)

Related Issue/PR: #2752

Future Plan: This PR only introduces core operations (quantization/dequantization). For better API support, we have to introduce tensor utility operations like indexing and slicing.

Test Plan: test/prototype/quantization/quantize_/float8/test_float8_semisparse_tensor.py

namgyu-youn avatar Oct 29 '25 18:10 namgyu-youn

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3258

Note: Links to docs will display an error until the docs builds have been completed.

:heavy_exclamation_mark: 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

:x: 9 New Failures

As of commit f5f7a1717521b2a711602cab19640fec0dfe7700 with merge base 3577306c8b32517afe8eb6eb7e84335601180598 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Oct 29 '25 18:10 pytorch-bot[bot]

@jcaip could you please check this PR?

namgyu-youn avatar Oct 29 '25 18:10 namgyu-youn

cc @namgyu-youn

Can you split this into two PRs? one for int8 and one for float8?

In general I don't think we want to introduce weight-only sparsity configs for int8 and float8 because we don't have mixed-dtype kernel support currently. The only kernels we have are for int8 x int8 2:4 sparse and fp8 x fp8 2:4 sparse.

I would like Int8SemiSparseTensor though, but I think it should live in prototype until we have a user for it.

Also cc @bbeckca who has been working on fp8xfp8 2:4 sparse tensor subclass migration in #3182.

jcaip avatar Oct 31 '25 18:10 jcaip

cc @namgyu-youn

Can you split this into two PRs? one for int8 and one for float8?

In general I don't think we want to introduce weight-only sparsity configs for int8 and float8 because we don't have mixed-dtype kernel support currently. The only kernels we have are for int8 x int8 2:4 sparse and fp8 x fp8 2:4 sparse.

I would like Int8SemiSparseTensor though, but I think it should live in prototype until we have a user for it.

Also cc @bbeckca who has been working on fp8xfp8 2:4 sparse tensor subclass migration in #3182.

@jcaip if we want to move int8 2:4 sparse to prototype, then we don't need to migrate the tensor I think

jerryzh168 avatar Oct 31 '25 18:10 jerryzh168

Okay, then I'll address only ~W8A8-INT~ W8A8-FP here and keep file structure at the prototype.

namgyu-youn avatar Oct 31 '25 19:10 namgyu-youn

cc @namgyu-youn I talked to @bbeckca and I think your PR is closer so lets use it instead. Can you remove the int8 changes then and I will give this a review. Thanks for picking this up!

jcaip avatar Oct 31 '25 20:10 jcaip

cc @jcaip to request review, thanks.

namgyu-youn avatar Nov 02 '25 09:11 namgyu-youn

cc @namgyu-youn

I think there's a bit of confusion on what the tensor subclass should be storing and how to do the op overload.

Please take a look at https://github.com/pytorch/ao/pull/3182/files#diff-afc7dd21d2b704181a6fd55be989426c0217a2bbfb694af9eb9746239ec462ed for the appropriate logic / ops to be called.

@jcaip Thanks a lot for the comprehensive review. I didn't know there was an already opened PR (#3182), and I found my implementation is quite far away (mostly ops, kernel). Therefore, the right move seems to be reopening #3182 and letting me update it after the last review. Is it okay to go with this? Let me know which move is right for progress. Also, cc @bbeca who already did this work.

namgyu-youn avatar Nov 13 '25 05:11 namgyu-youn

@namgyu-youn I think it'll be easier for me to just migrate this over, mind if I take over the PR? #3182 is also quite far from landing.

jcaip avatar Nov 13 '25 16:11 jcaip

@pytorchbot label "sparsity"

namgyu-youn avatar Nov 16 '25 14:11 namgyu-youn