neural-compressor
neural-compressor copied to clipboard
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
## Type of Change feature or bug fix or documentation or validation or others API changed or not ## Description detail description ## Expected Behavior & Potential Risk the expected...
Hi, I am trying to reproduce some of the examples, but it looks like they are outdated. I am not able to load: _from neural_compressor.experimental import Quantization, common_ which appears...
## Type of Change A feature of UniformQDQ ## Description A feature of UniformQDQ - support CV/NLP model's OPs, includingConv, DepthwiseConv2D, MatMul, etc. Additional op support be added upon request....
## Type of Change # What does this PR do? Support FP8 static quantization for optimum-habana deepseek v3/r1 models using Intel Neural Compressor (INC) This feature needs changes in: -...
To fix sw-219134. Remove legacy import in init.
Recover PatchedVLLMKVCache
As examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/mx_quant/Readme suggests, I run `python run_clm_no_trainer.py --model ./Qwen2-1.5B-Instruct --quantize --accuracy --tasks lambada_openai --w_dtype fp4 --woq` But it returns error: ``` 2025-02-12 13:21:16 [WARNING][auto_accelerator.py:418] Auto detect accelerator: CPU_Accelerator. 2025-02-12 13:21:16...
Hi, I wonder if neural compressor supports visual language model that accept visual image and text as inputs?
Is it compatible flexattention from pytorch 2.6.0?
When quantizing Mx fp, the quantization scales of subnormal and normal values should be different. Why does L394 clip to min_exp? I understand that it should clip to 1. Looking...