neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Results 155 neural-compressor issues
Sort by recently updated
recently updated
newest added

Hi all, I have been trying to apply **post-training-quantization** to a custom vision model (pretrained vgg16 model) which I have already finetuned using "xpu" (Intel GPU Max Series). I have...

https://github.com/intel/neural-compressor/blob/4372a762585189accc65196e081a0a7a85f5af9e/neural_compressor/torch/algorithms/weight_only/utility.py#L69 FP4_BNB = [-12.0, -8.0, -6.0, -4.0, -3.0, -2.0, -0.0625, 0, 0.0625, 2.0, 3.0, 4.0, 6.0, 8.0, 12.0] FP4_E2M1 = [-6.0, -4.0, -3.0, -2.0, -1.5, -1.0, -0.0625, 0, 0.0625, 1.0,...

## Type of Change example ## Description add SDXL model example to INC 3.x ## Expected Behavior & Potential Risk ## How has this PR been tested? local test ##...

when loading the quantized model (smoothquant) with ``` from neural_compressor.utils.pytorch import load qmodel = load(qmodel_path, model_fp) ``` I got `RecursiveScriptModule(original_name=QuantizationDispatchModule)` I'd like to extract those quantized int8 weight matrix, together...

aitce

## Type of Change feature API not changed ## Description add support to xpu device for 3.x ipex static ## Expected Behavior & Potential Risk ## How has this PR...

## Type of Change others ## Description 1. Remove deprecated modules 2. Bump version into v3.0 ## Expected Behavior & Potential Risk CI pass ## How has this PR been...

[https://github.com/intel/neural-compressor/blob/master/docs/source/validated_model_list.md/#pytorch-models-with-torch-201cpu-in-woq-mode](url) shows the accuracy of `int4` compared to `fp32`. Is there any data about 4-bit floating point numbers (e.g. `nf4`, `fp4`, etc.) and their performance data? THANKS

## Type of Change feature or bug fix or documentation or validation or others API changed or not ## Description Usage: ```bash None # default value, Autodetect (client is True)...

I am trying to run quantization for int4 examples from `examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only` but there is a package missing in the requirement.txt. `numba` package needs to be added to the requirements.txt. ```...

## Type of Change Update WOQ int4 recipes Bump INC version to 3.2 ## Description detail description ## Expected Behavior & Potential Risk the expected behavior that triggered by this...