Xia Weiwen
Xia Weiwen
## Description This PR improves performance of quantized kernel for normalize by vectorizing scalar remainder. In the current implementation [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp), the computation is vectorized while the scalar remainder is handled...
## Description Support lowering of channel shuffle in FX by adding its module and functional op to `is_copy_node` list in `torch/ao/quantization/fx/_lower_to_native_backend.py` ## Validation UTs added to test - correctness of...
**Summary** This work continues with https://github.com/pytorch/pytorch/pull/83784 by @vkuzo and includes all the changes in that PR. Quote from https://github.com/pytorch/pytorch/pull/83784: > Issue #83658 reports that ops followed by a certain pattern...
## Improvements This upgrade fixes the following issues: - https://github.com/pytorch/pytorch/issues/120982 This upgrade brings the following new features: - Introduced memory descriptor serialization API. This API is needed to support freezing...
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #122667 * #122593 * #122387 * #123240 **Description** Add fusion path for dynamic quant and for QAT. The following patterns can be...
Adds implementation for the following ops on CPU backend: - quantize_4bit - dequantize_4bit - gemv_4bit Limitations: - `quant_storage` must be torch.uint8 - `compress_statistics` is not supported yet (`bnb_4bit_use_double_quant` must be...
`torch.compile` requires `g++`. On platforms like Windows, `g++` is not available. In that case, `torch.compile` is disabled to avoid runtime errors. With this patch, the CPU/XPU backend works on Windows....
Fixes #133448 The arg order for mkldnn qconv IR became incorrect after PR #132367 . This PR fixes the bug. **Test plan** `python test/inductor/test_mkldnn_pattern_matcher.py -k qconv` `python test/inductor/test_cpu_cpp_wrapper.py -k qconv`...
Still WIP The implementation of SmoothQuant with tensor subclassing (AffineQuantizedTensor) is similar to that of AWQ with the following differences: - SmoothQuant supports both static and dynamic quantization of activation...
**Summary** This PR enables DA8W4 on CPU. - It adds a new layout `Int8DynamicActInt4WeightCPULayout` and its implementation - It adds two custom ops: `da8w4_linear_prepack_cpu` for weight packing and `da8w4_linear_cpu` for...