ao
ao copied to clipboard
Introduce new W8A8-FP-CSR quantitzation API
Summary:
Introduce new W8A8-FP-CSR quantization API, Float8SemiSparseTensor, which specializes in semi-sparse pattern using cuSPARSELt accelerations (https://docs.nvidia.com/cuda/cusparselt/)
Related Issue/PR: #2752
Future Plan: This PR only introduces core operations (quantization/dequantization). For better API support, we have to introduce tensor utility operations like indexing and slicing.
Test Plan: test/prototype/quantization/quantize_/float8/test_float8_semisparse_tensor.py
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3258
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:heavy_exclamation_mark: 1 Active SEVs
There are 1 currently active SEVs. If your PR is affected, please view them below:
:x: 9 New Failures
As of commit f5f7a1717521b2a711602cab19640fec0dfe7700 with merge base 3577306c8b32517afe8eb6eb7e84335601180598 ():
NEW FAILURES - The following jobs have failed:
- PR Label Check / Check PR Labels (gh)
Process completed with exit code 1. - Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t c223961b253dc35cb0bdfc33afeadfc867b003875b232cbcad0b6dbd3eca1083 /exec failed with exit code 2 - Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t f3a656d52122687f4af72f221f6592e09add42cb355a2ca782caf8cca43de94b /exec failed with exit code 2 - Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 4e67dbc027c7a9da0e2f4ef6c179bf49d6762ab1f0b177eb137897267189afa1 /exec failed with exit code 2 - Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t a7be2ef744c754d91d45e53dca1143cc86b0c7b5cb44b056350bc8fdc2a6a56e /exec failed with exit code 2 - Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t de24936e5bd69b5fc4ad624996fa49efbc9676a6109b38106319fd323040dedf /exec failed with exit code 2 - Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t b751bb078c0c06a765f44b8a577b613f99360e0138229220f1117abb82369d83 /exec failed with exit code 2 - Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
RuntimeError: Command docker exec -t fd0dde55b33d1bafc1dd84e13e9694ae7d573194ec8f8fcfce02da8251e6abda /exec failed with exit code 2 - Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
RuntimeError: Command docker exec -t be34cc8afd057d83f2f6c230c91ef55e859a8e2c22860269d800dd32aff92b39 /exec failed with exit code 2
This comment was automatically generated by Dr. CI and updates every 15 minutes.
@jcaip could you please check this PR?
cc @namgyu-youn
Can you split this into two PRs? one for int8 and one for float8?
In general I don't think we want to introduce weight-only sparsity configs for int8 and float8 because we don't have mixed-dtype kernel support currently. The only kernels we have are for int8 x int8 2:4 sparse and fp8 x fp8 2:4 sparse.
I would like Int8SemiSparseTensor though, but I think it should live in prototype until we have a user for it.
Also cc @bbeckca who has been working on fp8xfp8 2:4 sparse tensor subclass migration in #3182.
cc @namgyu-youn
Can you split this into two PRs? one for int8 and one for float8?
In general I don't think we want to introduce weight-only sparsity configs for int8 and float8 because we don't have mixed-dtype kernel support currently. The only kernels we have are for int8 x int8 2:4 sparse and fp8 x fp8 2:4 sparse.
I would like Int8SemiSparseTensor though, but I think it should live in prototype until we have a user for it.
Also cc @bbeckca who has been working on fp8xfp8 2:4 sparse tensor subclass migration in #3182.
@jcaip if we want to move int8 2:4 sparse to prototype, then we don't need to migrate the tensor I think
Okay, then I'll address only ~W8A8-INT~ W8A8-FP here and keep file structure at the prototype.
cc @namgyu-youn I talked to @bbeckca and I think your PR is closer so lets use it instead. Can you remove the int8 changes then and I will give this a review. Thanks for picking this up!
cc @jcaip to request review, thanks.
cc @namgyu-youn
I think there's a bit of confusion on what the tensor subclass should be storing and how to do the op overload.
Please take a look at https://github.com/pytorch/ao/pull/3182/files#diff-afc7dd21d2b704181a6fd55be989426c0217a2bbfb694af9eb9746239ec462ed for the appropriate logic / ops to be called.
@jcaip Thanks a lot for the comprehensive review. I didn't know there was an already opened PR (#3182), and I found my implementation is quite far away (mostly ops, kernel). Therefore, the right move seems to be reopening #3182 and letting me update it after the last review. Is it okay to go with this? Let me know which move is right for progress. Also, cc @bbeca who already did this work.
@namgyu-youn I think it'll be easier for me to just migrate this over, mind if I take over the PR? #3182 is also quite far from landing.
@pytorchbot label "sparsity"