ao icon indicating copy to clipboard operation
ao copied to clipboard

PyTorch native quantization and sparsity for training and inference

Results 163 ao issues
Sort by recently updated
recently updated
newest added

Getting this error with `int4` quantization. May be a noob question: Is this a bug or does `int4` require the weights to be in `bfloat16`? ``` Traceback (most recent call...

As we start onboarding more dtypes we ideally want them to work in as many different situations as possible so opening this tracker and will update the table as things...

# Summary Autoquant will iterate through a user module and identify all linear dtype + shapes as well as execution time for different quantization routines. This information is baked into...

autoquant

Stacked PRs: * #709 * #707 * __->__#706 --- --- --- ### add ability to calculate amax in tiles ghstack-source-id: 83ccec3ec66f30b9d75146d0fc7b1137ea7574c4 Pull Request resolved: https://github.com/pytorch/ao/pull/682

CLA Signed

Getting a big whopper of an error trying to apply the optimizations shown in the [directions for CogVideoX](https://huggingface.co/THUDM/CogVideoX-2b) ``` --------------------------------------------------------------------------- TorchRuntimeError Traceback (most recent call last) Cell In[2], line 7...

# Summary Currently our CI/CD pipline uses ruff to format and lint files in the codebase. They are hardcoded to the list in ruff.toml. We should also add mypy support...

enhancement

# Summary Today whenever a user runs autoquant, the [AutoQuantCache](https://github.com/pytorch/ao/blob/e1039abac7f429a8d7f489d047d9b34d6ac6afe2/torchao/quantization/autoquant.py#L33) gets populated with dtype + information for Linears seen within an arbitrary torch.nn.Module. This cache is not persistent. We should...

autoquant

att, I think it's because we did padding before: https://github.com/pytorch/ao/blob/e1039abac7f429a8d7f489d047d9b34d6ac6afe2/torchao/dtypes/affine_quantized_tensor.py#L480 and we should restore the shape after unpacking from TensorCoreTiledLayout

Branch cut: Sept 5th Release: Sept 10th ## Low-precision inference - [x] Make IntX dtypes work with AffineQuantizedTensor @jerryzh168 (https://github.com/pytorch/ao/pull/672) - [ ] Performance benchmarks for intX dtypes on relevant...