ao
ao copied to clipboard
PyTorch native quantization and sparsity for training and inference
Corresponding issue: #579 This PR adds SpinQuant integration to `pytorch/ao`. See the paper for details: https://arxiv.org/abs/2405.16406. Initial results on Llama-2-7b are shown below (measured by Wikitext word perplexity). | Model...
Background: The [spin quant paper](https://arxiv.org/pdf/2405.16406) introduces a method of improving quantization by adding additional rotation matrices to the model weights that improve quantization performance. While spin-quant is a fairly sophisticated...
Adds AWQ per #530 To do: - [x] Verify correctness of implementation and add tests for this - [ ] Fold activation scaling into previous layer if applicable - [x]...
Summary: The issue: When using float8 training with FSDP, we have these tensors in the forward_backward graph: - Without fp8-all-gather: original_weight (all-gather output, sharded) - fp8_weight - fp8_weight_transpose (needed in...
Summary: Added an example and util for awq like flow that applies extra equalization scale tensor to input activation Test Plan: python tutorials/calibration_flow/awq_like.py Reviewers: Subscribers: Tasks: Tags:
Summary: The diff modifies the `padding` option and added tests with `compile`: * For the scaled_mm of shape MxKxN, the current `inner_padding` option only pads the `K` dimension. However, if...
Currently torchao QAT has two APIs, [tensor subclasses](https://github.com/pytorch/ao/blob/a4221df5e10ff8c33854f964fe6b4e00abfbe542/torchao/quantization/prototype/qat/api.py#L41) and [module swap](https://github.com/pytorch/ao/blob/a4221df5e10ff8c33854f964fe6b4e00abfbe542/torchao/quantization/prototype/qat/_module_swap_api.py#L39). The original plan was to deprecate and eventually remove the old module swap API in favor of the tensor...
Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops? I was testing with ```python import torch from torchao.quantization.quant_api import ( quantize_, int8_dynamic_activation_int8_weight, int4_weight_only, int8_weight_only, unwrap_tensor_subclass, ) # define a floating point model where...
I installed on windows and failing from torchao.quantization import quantize_ pip freeze ``` Microsoft Windows [Version 10.0.19045.4894] (c) Microsoft Corporation. All rights reserved. R:\CogVideoX_v1\CogVideoX_SECourses\venv\Scripts>activate (venv) R:\CogVideoX_v1\CogVideoX_SECourses\venv\Scripts>pip freeze accelerate==0.34.2 aiofiles==23.2.1 annotated-types==0.7.0...
Summary: The following is the directory structure of the submitted code under torchao ``` experimental/ ├── kernels/ │ └── mps/ │ ├── metal/ │ │ └── (metal shaders) │ ├──...