neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Results 155 neural-compressor issues
Sort by recently updated
recently updated
newest added

The Reference Kit here https://github.com/oneapi-src/visual-quality-inspection/blob/main/src/intel_neural_compressor/neural_compressor_inference.py does the following import.. from neural_compressor.experimental import Benchmark then it runs the following code... evaluator = Benchmark(config_path) evaluator.model = int8_model # create benchmark dataloader like...

Signed-off-by: Sun, Xuehao ## Type of Change Modify NAS n_worker as a variable ## Description Change NAS n_worker to be a variable instead of a constant ## Expected Behavior &...

enhancement
review

Hi, when I use ipex quantization with inc, I meet a problem that quantized model can't be loaded after save. When I save, I just call `quantized.save(path)` and I get...

Hi, Is it possible to run this with WSL2? The ONNX Resnet example breaks with Windows11 WSL2 ![image](https://user-images.githubusercontent.com/25264037/198390108-73e881f5-a018-454f-b2b7-e9699165b593.png)

Hi below is my config file: version: 1.0 model: name: bert framework: pytorch # Mandatory: possible values are pytorch and pytorch_fx. device: cpu quantization: approach: quant_aware_training # Mandatory: Quantization approach,...

## Type of Change bug fix ## Description Fix onnxrt calibration issue ## Expected Behavior & Potential Risk the expected behavior that triggered by this PR ## How has this...

## Type of Change bug fix ## Description Fix code_detection example export issue ## Expected Behavior & Potential Risk the expected behavior that triggered by this PR ## How has...

## Type of Change validation ## Description Enable 3x WOQ example in CI ## Expected Behavior & Potential Risk CI pass and accuracy gap grepped. ## How has this PR...

## Type of Change feature ## Description - [x] support per channel quantization for higher accuracy - [x] add observer registry for easy extension - [x] dump scale_inv from observer...

Hello, I am trying to run the following script: https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm I use the script below: ``` OMP_NUM_THREADS=32 python run_clm_no_trainer.py --model facebook/opt-1.3b --quantize --sq --alpha 0.5 --ipex --output_dir "saved_results" --int8_bf16_mixed ```...