ao issues

[wip] SpinQuant

8

Corresponding issue: #579 This PR adds SpinQuant integration to `pytorch/ao`. See the paper for details: https://arxiv.org/abs/2405.16406. Initial results on Llama-2-7b are shown below (measured by Wikitext word perplexity). | Model...

tobiasvanderwerff

CLA Signed

Spin Quant in TorchAO

7

Background: The [spin quant paper](https://arxiv.org/pdf/2405.16406) introduces a method of improving quantization by adding additional rotation matrices to the model weights that improve quantization performance. While spin-quant is a fairly sophisticated...

HDCharles

good first issue

[WIP] Activation Aware Weight Quantization (AWQ)

3

Adds AWQ per #530 To do: - [x] Verify correctness of implementation and add tests for this - [ ] Fold activation scaling into previous layer if applicable - [x]...

vayuda

CLA Signed

Use checkpoint to enforece the recomputation of fp8 weight

11

Summary: The issue: When using float8 training with FSDP, we have these tensors in the forward_backward graph: - Without fp8-all-gather: original_weight (all-gather output, sharded) - fp8_weight - fp8_weight_transpose (needed in...

y-sq

CLA Signed

fb-exported

Add example for awq like flow

1

Summary: Added an example and util for awq like flow that applies extra equalization scale tensor to input activation Test Plan: python tutorials/calibration_flow/awq_like.py Reviewers: Subscribers: Tasks: Tags:

jerryzh168

CLA Signed

Some changes in `inner-padding` option

9

Summary: The diff modifies the `padding` option and added tests with `compile`: * For the scaled_mm of shape MxKxN, the current `inner_padding` option only pads the `K` dimension. However, if...

y-sq

CLA Signed

fb-exported

[RFC] Long Term QAT Flow

7

Currently torchao QAT has two APIs, [tensor subclasses](https://github.com/pytorch/ao/blob/a4221df5e10ff8c33854f964fe6b4e00abfbe542/torchao/quantization/prototype/qat/api.py#L41) and [module swap](https://github.com/pytorch/ao/blob/a4221df5e10ff8c33854f964fe6b4e00abfbe542/torchao/quantization/prototype/qat/_module_swap_api.py#L39). The original plan was to deprecate and eventually remove the old module swap API in favor of the tensor...

andrewor14

rfc

Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops?

6

Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops? I was testing with ```python import torch from torchao.quantization.quant_api import ( quantize_, int8_dynamic_activation_int8_weight, int4_weight_only, int8_weight_only, unwrap_tensor_subclass, ) # define a floating point model where...

justinchuby

is this only for linux?

16

I installed on windows and failing from torchao.quantization import quantize_ pip freeze ``` Microsoft Windows [Version 10.0.19045.4894] (c) Microsoft Corporation. All rights reserved. R:\CogVideoX_v1\CogVideoX_SECourses\venv\Scripts>activate (venv) R:\CogVideoX_v1\CogVideoX_SECourses\venv\Scripts>pip freeze accelerate==0.34.2 aiofiles==23.2.1 annotated-types==0.7.0...

FurkanGozukara

multibackend

Introduce lowbit quantized linear MPS kernels

4

Summary: The following is the directory structure of the submitted code under torchao ``` experimental/ ├── kernels/ │ └── mps/ │ ├── metal/ │ │ └── (metal shaders) │ ├──...

manuelcandales

CLA Signed

fb-exported

ao
ao copied to clipboard

Metadata

[wip] SpinQuant

Spin Quant in TorchAO

[WIP] Activation Aware Weight Quantization (AWQ)

Use checkpoint to enforece the recomputation of fp8 weight

Add example for awq like flow

Some changes in `inner-padding` option

[RFC] Long Term QAT Flow

Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops?

is this only for linux?

Introduce lowbit quantized linear MPS kernels

← Metadata

Owner

Metadata

ao ao copied to clipboard

Metadata

← Metadata

Owner

Metadata

ao
ao copied to clipboard