neural-compressor issues

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor

3

PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep") produces fp32 ops.

1

The below PostTrainingQuantConfig produces fp32 ops for NPU using 2.4.1. Models with int8 and fp16 ops would be preferred for NPU. conf=PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep", quant_format="QOperator", approach="static", excluded_precisions=['bf16'])

kleiti

AWQ quantization is very slow for ONNX LLMs

1

I'm not sure if I'm missing an option somewhere, but AWQ quantization for large ONNX models is very slow. When quantizing a 7B LLaMA model, the 4 following `np.matmul` calls...

PatriceVignola

How to perform int8 quantisation (not uint8) using ONNX?

1

Hi team, I am having issue quantizing the network consisting of Conv and Linear layers using **int8** weights and activations in ONNX. I have tried setting it using op_type_dict, however...

paul-ang

Unable to save llama2 after SmoothQuant

1

Hi all, I'm attempting to follow the SmoothQuant tutorial for the LLAMA2-7b model: [https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/ptq_static] System configuration: OS : WINDOWS 11 Python: Python 3.10.11 My steps: 1. CREATE PROJECT FOLDERr: neural-compressor-tutorial...

dellamuradario

how to get layer_mappings for distillation?

1

hi, I want to write scripts to print layer_mappings for distillation, my script like this: `for name, module in model.named_modules(): print(name)` while the results is far away from default layer_mapping....

Michael-Fuu

AWQ fails on ONNX model when a MatMul node's input is a model input/initializer

1

Hello, The [awq_quantize](https://github.com/intel/neural-compressor/blob/42c2def02e128818f19d8342052ab0544e9623f7/neural_compressor/adaptor/ox_utils/weight_only.py#L703) function [collects the names of input tensors to each MatMul node](https://github.com/intel/neural-compressor/blob/42c2def02e128818f19d8342052ab0544e9623f7/neural_compressor/adaptor/ox_utils/weight_only.py#L758-L764), and later [looks up the parent node that produces the named tensor](https://github.com/intel/neural-compressor/blob/42c2def02e128818f19d8342052ab0544e9623f7/neural_compressor/adaptor/ox_utils/weight_only.py#L783). This assumes the tensors...

jstoecker

Export Quantized Model to ONNX: `NotImplementedError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend.`

7

Dear all, In order to easily use Intel Neural Compressor in our team, and because we use PyTorch Lightning, I am building Lightning Callbacks in order to call your hooks...

clementpoiret

Quantized Neural compress model not generating expected results in AMD processor

3

Hi Team, I have converted a norma t5 small model to Onnx using onnxruntime 1.15.1, python =3.10.12 in Intel Processor and AMD processor but received different response! Please let me...

Bhuvaneswaran-R

[BUG] - segmentation fault occur when follow the tutorial

1

see https://github.com/pytorch/tutorials/issues/2690, looks like there's an issue with the tutorial at https://pytorch.org/tutorials/recipes/intel_neural_compressor_for_pytorch.html, has an issue with the neural compressor which is causing a seg fault. looks like a contributor @ftian1...

HDCharles

neural-compressor
neural-compressor copied to clipboard

Metadata

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor

PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep") produces fp32 ops.

AWQ quantization is very slow for ONNX LLMs

How to perform int8 quantisation (not uint8) using ONNX?

Unable to save llama2 after SmoothQuant

how to get layer_mappings for distillation?

AWQ fails on ONNX model when a MatMul node's input is a model input/initializer

Export Quantized Model to ONNX: `NotImplementedError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend.`

Quantized Neural compress model not generating expected results in AMD processor

[BUG] - segmentation fault occur when follow the tutorial

← Metadata

Owner

Metadata

neural-compressor neural-compressor copied to clipboard

Metadata

← Metadata

Owner

Metadata

neural-compressor
neural-compressor copied to clipboard