ao
ao copied to clipboard
PyTorch native quantization and sparsity for training and inference
Getting this error with `int4` quantization. May be a noob question: Is this a bug or does `int4` require the weights to be in `bfloat16`? ``` Traceback (most recent call...
As we start onboarding more dtypes we ideally want them to work in as many different situations as possible so opening this tracker and will update the table as things...
# Summary Autoquant will iterate through a user module and identify all linear dtype + shapes as well as execution time for different quantization routines. This information is baked into...
Stacked PRs: * #709 * #707 * __->__#706 --- --- --- ### add ability to calculate amax in tiles ghstack-source-id: 83ccec3ec66f30b9d75146d0fc7b1137ea7574c4 Pull Request resolved: https://github.com/pytorch/ao/pull/682
Getting a big whopper of an error trying to apply the optimizations shown in the [directions for CogVideoX](https://huggingface.co/THUDM/CogVideoX-2b) ``` --------------------------------------------------------------------------- TorchRuntimeError Traceback (most recent call last) Cell In[2], line 7...
# Summary Currently our CI/CD pipline uses ruff to format and lint files in the codebase. They are hardcoded to the list in ruff.toml. We should also add mypy support...
# Summary Today whenever a user runs autoquant, the [AutoQuantCache](https://github.com/pytorch/ao/blob/e1039abac7f429a8d7f489d047d9b34d6ac6afe2/torchao/quantization/autoquant.py#L33) gets populated with dtype + information for Linears seen within an arbitrary torch.nn.Module. This cache is not persistent. We should...
att, I think it's because we did padding before: https://github.com/pytorch/ao/blob/e1039abac7f429a8d7f489d047d9b34d6ac6afe2/torchao/dtypes/affine_quantized_tensor.py#L480 and we should restore the shape after unpacking from TensorCoreTiledLayout
Branch cut: Sept 5th Release: Sept 10th ## Low-precision inference - [x] Make IntX dtypes work with AffineQuantizedTensor @jerryzh168 (https://github.com/pytorch/ao/pull/672) - [ ] Performance benchmarks for intX dtypes on relevant...