ao Introduce new W8A8-FP-CSR quantitzation API

Summary: Introduce new W8A8-FP-CSR quantization API, Float8SemiSparseTensor, which specializes in semi-sparse pattern using cuSPARSELt accelerations (https://docs.nvidia.com/cuda/cusparselt/)

Related Issue/PR: #2752

Future Plan: This PR only introduces core operations (quantization/dequantization). For better API support, we have to introduce tensor utility operations like indexing and slicing.

Test Plan: test/prototype/quantization/quantize_/float8/test_float8_semisparse_tensor.py

Oct 29 '25 18:10 namgyu-youn

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3258

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:heavy_exclamation_mark: 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm failures during provisioning step due to network issues

:x: 9 New Failures

As of commit f5f7a1717521b2a711602cab19640fec0dfe7700 with merge base 3577306c8b32517afe8eb6eb7e84335601180598 ():

NEW FAILURES - The following jobs have failed:

PR Label Check / Check PR Labels (gh) Process completed with exit code 1.
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh) RuntimeError: Command docker exec -t c223961b253dc35cb0bdfc33afeadfc867b003875b232cbcad0b6dbd3eca1083 /exec failed with exit code 2
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh) RuntimeError: Command docker exec -t f3a656d52122687f4af72f221f6592e09add42cb355a2ca782caf8cca43de94b /exec failed with exit code 2
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh) RuntimeError: Command docker exec -t 4e67dbc027c7a9da0e2f4ef6c179bf49d6762ab1f0b177eb137897267189afa1 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh) RuntimeError: Command docker exec -t a7be2ef744c754d91d45e53dca1143cc86b0c7b5cb44b056350bc8fdc2a6a56e /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh) RuntimeError: Command docker exec -t de24936e5bd69b5fc4ad624996fa49efbc9676a6109b38106319fd323040dedf /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh) RuntimeError: Command docker exec -t b751bb078c0c06a765f44b8a577b613f99360e0138229220f1117abb82369d83 /exec failed with exit code 2
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh) RuntimeError: Command docker exec -t fd0dde55b33d1bafc1dd84e13e9694ae7d573194ec8f8fcfce02da8251e6abda /exec failed with exit code 2
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) RuntimeError: Command docker exec -t be34cc8afd057d83f2f6c230c91ef55e859a8e2c22860269d800dd32aff92b39 /exec failed with exit code 2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Oct 29 '25 18:10 pytorch-bot[bot]

@jcaip could you please check this PR?

Oct 29 '25 18:10 namgyu-youn

cc @namgyu-youn

Can you split this into two PRs? one for int8 and one for float8?

In general I don't think we want to introduce weight-only sparsity configs for int8 and float8 because we don't have mixed-dtype kernel support currently. The only kernels we have are for int8 x int8 2:4 sparse and fp8 x fp8 2:4 sparse.

I would like Int8SemiSparseTensor though, but I think it should live in prototype until we have a user for it.

Also cc @bbeckca who has been working on fp8xfp8 2:4 sparse tensor subclass migration in #3182.

Oct 31 '25 18:10 jcaip

cc @namgyu-youn

Can you split this into two PRs? one for int8 and one for float8?

In general I don't think we want to introduce weight-only sparsity configs for int8 and float8 because we don't have mixed-dtype kernel support currently. The only kernels we have are for int8 x int8 2:4 sparse and fp8 x fp8 2:4 sparse.

I would like Int8SemiSparseTensor though, but I think it should live in prototype until we have a user for it.

Also cc @bbeckca who has been working on fp8xfp8 2:4 sparse tensor subclass migration in #3182.

@jcaip if we want to move int8 2:4 sparse to prototype, then we don't need to migrate the tensor I think

Oct 31 '25 18:10 jerryzh168

Okay, then I'll address only ~W8A8-INT~ W8A8-FP here and keep file structure at the prototype.

Oct 31 '25 19:10 namgyu-youn

cc @namgyu-youn I talked to @bbeckca and I think your PR is closer so lets use it instead. Can you remove the int8 changes then and I will give this a review. Thanks for picking this up!

Oct 31 '25 20:10 jcaip

cc @jcaip to request review, thanks.

Nov 02 '25 09:11 namgyu-youn

cc @namgyu-youn

I think there's a bit of confusion on what the tensor subclass should be storing and how to do the op overload.

Please take a look at https://github.com/pytorch/ao/pull/3182/files#diff-afc7dd21d2b704181a6fd55be989426c0217a2bbfb694af9eb9746239ec462ed for the appropriate logic / ops to be called.

@jcaip Thanks a lot for the comprehensive review. I didn't know there was an already opened PR (#3182), and I found my implementation is quite far away (mostly ops, kernel). Therefore, the right move seems to be reopening #3182 and letting me update it after the last review. Is it okay to go with this? Let me know which move is right for progress. Also, cc @bbeca who already did this work.

Nov 13 '25 05:11 namgyu-youn

@namgyu-youn I think it'll be easier for me to just migrate this over, mind if I take over the PR? #3182 is also quite far from landing.

Nov 13 '25 16:11 jcaip

@pytorchbot label "sparsity"

Nov 16 '25 14:11 namgyu-youn

ao ao copied to clipboard

Introduce new W8A8-FP-CSR quantitzation API

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3258

:heavy_exclamation_mark: 1 Active SEVs

:x: 9 New Failures

ao
ao copied to clipboard