Xia Weiwen issues

Results 13 issues of


                                            Xia Weiwen

[Quant] Vectorize scalar remainder in quantized kernel for normalization

## Description This PR improves performance of quantized kernel for normalize by vectorizing scalar remainder. In the current implementation [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp), the computation is vectorized while the scalar remainder is handled...

triaged

open source

cla signed

intel priority

intel

[Quant] Support lowering of channel shuffle in FX

## Description Support lowering of channel shuffle in FX by adding its module and functional op to `is_copy_node` list in `torch/ao/quantization/fx/_lower_to_native_backend.py` ## Validation UTs added to test - correctness of...

triaged

open source

cla signed

intel priority

release notes: quantization

intel

[FX][Quant] Enable FX quant for patterns like x.view(x.size(...), ...)

**Summary** This work continues with https://github.com/pytorch/pytorch/pull/83784 by @vkuzo and includes all the changes in that PR. Quote from https://github.com/pytorch/pytorch/pull/83784: > Issue #83658 reports that ops followed by a certain pattern...

open source

release notes: quantization

intel

Upgrade submodule oneDNN to v3.4

## Improvements This upgrade fixes the following issues: - https://github.com/pytorch/pytorch/issues/120982 This upgrade brings the following new features: - Introduced memory descriptor serialization API. This API is needed to support freezing...

module: mkldnn

open source

topic: not user facing

intel

ciflow/xpu

[Quant][PT2E] enable qlinear post op fusion for dynamic quant & qat

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #122667 * #122593 * #122387 * #123240 **Description** Add fusion path for dynamic quant and for QAT. The following patterns can be...

open source

release notes: quantization

module: inductor

ciflow/inductor

Support 4bit on CPU backend

Adds implementation for the following ops on CPU backend: - quantize_4bit - dequantize_4bit - gemv_4bit Limitations: - `quant_storage` must be torch.uint8 - `compress_statistics` is not supported yet (`bnb_4bit_use_double_quant` must be...

CPU/XPU: disable torch.compile if g++ is not available

`torch.compile` requires `g++`. On platforms like Windows, `g++` is not available. In that case, `torch.compile` is disabled to avoid runtime errors. With this patch, the CPU/XPU backend works on Windows....

[Inductor][mkldnn] Bug fix: incorrect codegen arg order for qconv

Fixes #133448 The arg order for mkldnn qconv IR became incorrect after PR #132367 . This PR fixes the bug. **Test plan** `python test/inductor/test_mkldnn_pattern_matcher.py -k qconv` `python test/inductor/test_cpu_cpp_wrapper.py -k qconv`...

open source

intel

module: inductor

ciflow/inductor

[WIP] SmoothQuant using tensor subclassing

Still WIP The implementation of SmoothQuant with tensor subclassing (AffineQuantizedTensor) is similar to that of AWQ with the following differences: - SmoothQuant supports both static and dynamic quantization of activation...

CLA Signed

[CPU] Enable DA8W4 on CPU

**Summary** This PR enables DA8W4 on CPU. - It adds a new layout `Int8DynamicActInt4WeightCPULayout` and its implementation - It adds two custom ops: `da8w4_linear_prepack_cpu` for weight packing and `da8w4_linear_cpu` for...

CLA Signed

cpu

quantize

topic: new feature