neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Results 155 neural-compressor issues
Sort by recently updated
recently updated
newest added

If I define a parameter with the same name as "layer_scale" in the pytorch nn.Module, as shown in the following code, a ValueError occurs. ``` class ConvEncoder(nn.Module): """ Implementation of...

Hi! I tried to prune my model (mistralai/Mistral-7B-v0.1) with the following config ```python pruning_config = WeightPruningConfig( pruning_type="snip_momentum_progressive", start_step=0, end_step=15, sparsity_decay_type='exp', pruning_op_types=["Linear"], op_names=['.*.self_attn'], excluded_op_names=["lm_head", "embed_tokens"], max_sparsity_ratio_per_op=0.98, pruning_scope="global", ) ``` However when...

Encountering an issue while PTQ Static on Pytorch Model. The process involves utilizing pytorch metrics for benchmarking such as ['Accuracy','F1']. The workflow is executed within the VS code Jupyter extension....

Aims to clarify the HW deployment capabilities of Neural Compressor optimized models. - Specifically I'd like to know if these models are optimized for specific architectures such as x86 or...

**OS:** Ubuntu **Hardware:** CPU Intel(R) Xeon(R) Platinum 8468V I have installed the required dependencies listed in the GitHub repository using the latest versions, as specific versions were not specified transformers...