neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Results 155 neural-compressor issues
Sort by recently updated
recently updated
newest added

I’m curious if you will support Arc, neural compressor would particularly benefit those platforms! Thanks!

https://github.com/intel-innersource/frameworks.ai.pytorch.ipex-cpu/issues/2404

If a quantized model contains fake quantization nodes, how can such a model be parallelized, and how can its accuracy be validated on a dataset?

I think there is a bug in [ORTSmoothQuant._adjust_weights()](https://github.com/intel/neural-compressor/blob/de385a432acff1bc0384086c8c35b3442b860fc8/neural_compressor/adaptor/ox_utils/smooth_quant.py#L669). Part of this method presented below: ``` def _adjust_weights(self, scales): """Adjust the weights with scale. Args: scales (dict): The input scales """...

This RFC is to propose a Hugging Face-compatible yet flexible Weight Only Quantization (WOQ) format in INC, and then the model quantized by INC can be loaded by IPEX for...

Hello, I'm attempting to train a model for a micro-controller that only supports 8-bit precision or lower. This works perfectly when training using your `QuantizationAwareTrainingConfig`. In addition to this we...

I want to use the sparsity feature of the neural-compressor. I want to prune the model weights using block-wise granularity. Unlike traditional pruning approaches that zero out pruned weights, I...

Several models, such as LaMini-GPT, are utilizing this layer, but unfortunately, most of our algorithms do not currently support it. W8A8: SQ weight-only: RTN, TEQ better support tranformers.conv1d and torch.conv1d...

Hello, I have been attempting to quantize the t5-small model using the t5-small topology. Despite making changes to hyperparameters such as `tune = True` and `save_strategy="epoch"`. I have already created...

Hi, The quantisation function in `neural_compressor/quantization/fit` returns a `PyTorchFXModel` object, which contains two members `fp32_model` and `model`. Could you please let me know what is the correct way of evaluating...