neural-compressor
neural-compressor copied to clipboard
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
for llama, 2 patterns have not been detected, mlp.down_proj->mlp.up_proj, .self_attn.o_proj->module.self_attn.v_proj for opt, self_attn.out_proj->self_attn.v_proj
## Type of Change feature API changed or not: `get_woq_tuning_config` ## Description Add torch WOQ tuning https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#woq-algorithms-tuning ## How has this PR been tested? Pre-CI ## Dependency Change? None
## Type of Change modify op_type for set_local in 3.x API in ut and example ## Description according to changes in PR https://github.com/intel/neural-compressor/pull/1745 ## Expected Behavior & Potential Risk ##...
## Type of Change API changed or not: None ## How has this PR been tested? Pre-CI ## Dependency Change? None
## Type of Change feature ## Description support mx quant ## Expected Behavior & Potential Risk the expected behavior that triggered by this PR ## How has this PR been...
I was wondering if there is a way to resume qunatization from history.snapshot? I am using onnx and onnxrt_cuda_ep. I am can qunatize the model but before saving the model,...
## Type of Change example API not changed ## Description Added SDXL smooth quant example. ## Expected Behavior & Potential Risk the expected behavior that triggered by this PR ##...
Hi, I want to convert and quantize Pytorch model to ONNX model. I refer to this example https://github.com/intel/neural-compressor/blob/master/examples/pytorch/image_recognition/torchvision_models/export/fx/main.py When calling export function, there is error "'q_config' is needed when export...
Bumps [ejs](https://github.com/mde/ejs) from 3.1.9 to 3.1.10. Release notes Sourced from ejs's releases. v3.1.10 Version 3.1.10 Commits d3f807d Version 3.1.10 9ee26dd Mocha TDD e469741 Basic pollution protection 715e950 Merge pull request...
Hello community, I've tried the smoothquant flow on an OPT-125m model with the default setting. Unsurprisely the activations are quantized per tensor and weighs are per channel. According to the...