andrewor14 issues

Results 16 issues of


                                            andrewor14

Refactor lowering code by merging static and dynamic pattern matching

Following https://github.com/pytorch/pytorch/pull/74128 and https://github.com/pytorch/pytorch/pull/74362, this would be part 3 of the effort to reduce code duplication in the code that lowers reference quantized patterns to native quantized ops in fbgemm/qnnpack....

[Quant][fx] Enable FX static quantization for LSTM

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #85068 **Summary:** This commit enables the custom module LSTM path for FX graph mode static quantization. This has the same flow as...

cla signed

release notes: quantization

Batch Norm Consolidation

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #119496 * __->__ #116092 **Summary:** This commit simplifies the existing decomposition hierarchy of batch norm ops by adding a single, backend agnostic op:...

oncall: distributed

module: cpu

release notes: mps

ciflow/mps

module: inductor

module: dynamo

ciflow/inductor

Switch batch norm stack to consolidated ops

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #119496 Summary: This commit switches `aten.batch_norm` to call the new `batch_norm_with_update` and `batch_norm_no_update` ops, instead of the old `_batch_norm_impl_index` op. The new...

release notes: quantization

keep-going

Add support for 8da4w quantization

Summary: Add a new quantization for users to quantize their models using int8 per token dynamic activation + int4 per axis grouped weight quantization. Test Plan: ``` tune run quantize...

CLA Signed

[quant][pt2e] Fix conv-bn weight + bias per channel QAT

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #125208 Summary: This commit fixes the pattern matching for conv-bn during QAT fusion where both weight and bias are quantized per channel....

ciflow/trunk

release notes: quantization

Add QAT support for distributed finetuning

**Summary:** This commit adds the option to run quantization-aware training (QAT) during finetuning. QAT refers to "fake quantizing" the weights and activations during training, which performs the following transformation on...

CLA Signed

Update QAT recipe to match full_finetune_distributed

Summary: Many changes have gone into the full_finetune_distributed recipe but were not reflected on the equivalent qat_distributed recipe. This commit brings the latter up to date, adding features like FSDP2,...

CLA Signed

[RFC] Long Term QAT Flow

Currently torchao QAT has two APIs, [tensor subclasses](https://github.com/pytorch/ao/blob/a4221df5e10ff8c33854f964fe6b4e00abfbe542/torchao/quantization/prototype/qat/api.py#L41) and [module swap](https://github.com/pytorch/ao/blob/a4221df5e10ff8c33854f964fe6b4e00abfbe542/torchao/quantization/prototype/qat/_module_swap_api.py#L39). The original plan was to deprecate and eventually remove the old module swap API in favor of the tensor...

rfc

Make module swap the main QAT flow again

**Summary:** Following https://github.com/pytorch/ao/issues/987, this commit makes module swap the main QAT flow today. We remove all tensor subclass fake quantize injection logic since this is not needed in both the...

CLA Signed