neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Results 155 neural-compressor issues
Sort by recently updated
recently updated
newest added

for llama, 2 patterns have not been detected, mlp.down_proj->mlp.up_proj, .self_attn.o_proj->module.self_attn.v_proj for opt, self_attn.out_proj->self_attn.v_proj

## Type of Change feature API changed or not: `get_woq_tuning_config` ## Description Add torch WOQ tuning https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#woq-algorithms-tuning ## How has this PR been tested? Pre-CI ## Dependency Change? None

## Type of Change modify op_type for set_local in 3.x API in ut and example ## Description according to changes in PR https://github.com/intel/neural-compressor/pull/1745 ## Expected Behavior & Potential Risk ##...

## Type of Change API changed or not: None ## How has this PR been tested? Pre-CI ## Dependency Change? None

bug fix
2.x

## Type of Change feature ## Description support mx quant ## Expected Behavior & Potential Risk the expected behavior that triggered by this PR ## How has this PR been...

new feature

I was wondering if there is a way to resume qunatization from history.snapshot? I am using onnx and onnxrt_cuda_ep. I am can qunatize the model but before saving the model,...

help wanted
aitce

## Type of Change example API not changed ## Description Added SDXL smooth quant example. ## Expected Behavior & Potential Risk the expected behavior that triggered by this PR ##...

Hi, I want to convert and quantize Pytorch model to ONNX model. I refer to this example https://github.com/intel/neural-compressor/blob/master/examples/pytorch/image_recognition/torchvision_models/export/fx/main.py When calling export function, there is error "'q_config' is needed when export...

aitce

Bumps [ejs](https://github.com/mde/ejs) from 3.1.9 to 3.1.10. Release notes Sourced from ejs's releases. v3.1.10 Version 3.1.10 Commits d3f807d Version 3.1.10 9ee26dd Mocha TDD e469741 Basic pollution protection 715e950 Merge pull request...

dependencies
javascript

Hello community, I've tried the smoothquant flow on an OPT-125m model with the default setting. Unsurprisely the activations are quantized per tensor and weighs are per channel. According to the...