ao icon indicating copy to clipboard operation
ao copied to clipboard

PyTorch native quantization and sparsity for training and inference

Results 163 ao issues
Sort by recently updated
recently updated
newest added

Corresponding issue: #579 This PR adds SpinQuant integration to `pytorch/ao`. See the paper for details: https://arxiv.org/abs/2405.16406. Initial results on Llama-2-7b are shown below (measured by Wikitext word perplexity). | Model...

CLA Signed

Background: The [spin quant paper](https://arxiv.org/pdf/2405.16406) introduces a method of improving quantization by adding additional rotation matrices to the model weights that improve quantization performance. While spin-quant is a fairly sophisticated...

good first issue

Adds AWQ per #530 To do: - [x] Verify correctness of implementation and add tests for this - [ ] Fold activation scaling into previous layer if applicable - [x]...

CLA Signed

Summary: The issue: When using float8 training with FSDP, we have these tensors in the forward_backward graph: - Without fp8-all-gather: original_weight (all-gather output, sharded) - fp8_weight - fp8_weight_transpose (needed in...

CLA Signed
fb-exported

Summary: Added an example and util for awq like flow that applies extra equalization scale tensor to input activation Test Plan: python tutorials/calibration_flow/awq_like.py Reviewers: Subscribers: Tasks: Tags:

CLA Signed

Summary: The diff modifies the `padding` option and added tests with `compile`: * For the scaled_mm of shape MxKxN, the current `inner_padding` option only pads the `K` dimension. However, if...

CLA Signed
fb-exported

Currently torchao QAT has two APIs, [tensor subclasses](https://github.com/pytorch/ao/blob/a4221df5e10ff8c33854f964fe6b4e00abfbe542/torchao/quantization/prototype/qat/api.py#L41) and [module swap](https://github.com/pytorch/ao/blob/a4221df5e10ff8c33854f964fe6b4e00abfbe542/torchao/quantization/prototype/qat/_module_swap_api.py#L39). The original plan was to deprecate and eventually remove the old module swap API in favor of the tensor...

rfc

Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops? I was testing with ```python import torch from torchao.quantization.quant_api import ( quantize_, int8_dynamic_activation_int8_weight, int4_weight_only, int8_weight_only, unwrap_tensor_subclass, ) # define a floating point model where...

I installed on windows and failing from torchao.quantization import quantize_ pip freeze ``` Microsoft Windows [Version 10.0.19045.4894] (c) Microsoft Corporation. All rights reserved. R:\CogVideoX_v1\CogVideoX_SECourses\venv\Scripts>activate (venv) R:\CogVideoX_v1\CogVideoX_SECourses\venv\Scripts>pip freeze accelerate==0.34.2 aiofiles==23.2.1 annotated-types==0.7.0...

multibackend

Summary: The following is the directory structure of the submitted code under torchao ``` experimental/ ├── kernels/ │ └── mps/ │ ├── metal/ │ │ └── (metal shaders) │ ├──...

CLA Signed
fb-exported