neural-compressor
neural-compressor copied to clipboard
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
If I define a parameter with the same name as "layer_scale" in the pytorch nn.Module, as shown in the following code, a ValueError occurs. ``` class ConvEncoder(nn.Module): """ Implementation of...
Hi! I tried to prune my model (mistralai/Mistral-7B-v0.1) with the following config ```python pruning_config = WeightPruningConfig( pruning_type="snip_momentum_progressive", start_step=0, end_step=15, sparsity_decay_type='exp', pruning_op_types=["Linear"], op_names=['.*.self_attn'], excluded_op_names=["lm_head", "embed_tokens"], max_sparsity_ratio_per_op=0.98, pruning_scope="global", ) ``` However when...
Encountering an issue while PTQ Static on Pytorch Model. The process involves utilizing pytorch metrics for benchmarking such as ['Accuracy','F1']. The workflow is executed within the VS code Jupyter extension....
Aims to clarify the HW deployment capabilities of Neural Compressor optimized models. - Specifically I'd like to know if these models are optimized for specific architectures such as x86 or...
**OS:** Ubuntu **Hardware:** CPU Intel(R) Xeon(R) Platinum 8468V I have installed the required dependencies listed in the GitHub repository using the latest versions, as specific versions were not specified transformers...