neural-compressor
neural-compressor copied to clipboard
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
### Remove 1.x related code Some folders and files include both 1.x and 2.x UTs. I've removed the 1.x UTs if they import something from `experimental` or `conf`. Please help...
## Type of Change feature ## Description - [x] implement `incbench` command as entrypoint for ease-of-use benchmark - [x] automatically check numa/socket info and dump it with table for ease-of-understand...
## Type of Change feature ## Description add some new features for layer-wise quant, include get_weight, get_bias, update, and save/load. Make it more easy to use, like a normal model....
## Type of Change bug fix ## Description fix bf16 symbolic_trace bug, 1. cause abnormal recursive calling. 2. missing necessary attributes By moving BF16 fallback ahead of quantization and removing...
## Type of Change feature or bug fix or documentation or validation or others API changed or not ## Description detail description ## Expected Behavior & Potential Risk the expected...
## Type of Change feature ## Description usage ```python from neural_compressor.torch.algorithms.layer_wise import load_empty_model model = load_empty_model("hf-internal-testing/tiny-random-GPTJForCausalLM") quant_config = GPTQConfig( use_layer_wise=True, model_path="hf-internal-testing/tiny-random-GPTJForCausalLM" ) model = prepare(model, quant_config) run_fn(model) model = convert(model)...
updates: - [github.com/psf/black.git: 24.3.0 → 24.4.2](https://github.com/psf/black.git/compare/24.3.0...24.4.2) - [github.com/asottile/blacken-docs: 1.16.0 → 1.18.0](https://github.com/asottile/blacken-docs/compare/1.16.0...1.18.0) - [github.com/codespell-project/codespell: v2.2.6 → v2.3.0](https://github.com/codespell-project/codespell/compare/v2.2.6...v2.3.0) - [github.com/astral-sh/ruff-pre-commit: v0.3.5 → v0.5.0](https://github.com/astral-sh/ruff-pre-commit/compare/v0.3.5...v0.5.0)
## Type of Change ## Description Port auto-detect absorbs layers for TEQ ```bash pytest -sv test/3x/torch/algorithms/weight_only/test_teq_quantizer.py -k test_teq_detect_absorb_layers ``` ## Expected Behavior & Potential Risk PRE-CI ## Dependency Change? None
## Type of Change feature API changed or not: no ## Description Use different WeightOnlyLinear module according to device. - Abstract WeightOnlyLinear class. Inherited class INCWeightOnlyLinear and HPUWeighOnlyLinear - Load...
## Type of Change Update example for Pytorch 3x mixed precision ## Description - [x] add Torchvision resnet18 model as an example - [x] update document ## How has this...