neural-compressor issues

AssertionError of act_observer when using SmoothQuant for Llama-13b

When I tried smoothquant with sample code clip ``` from neural_compressor.torch.quantization import SmoothQuantConfig, convert, prepare def run_fn(model): model(example_inputs) quant_config = SmoothQuantConfig(alpha=0.5) prepared_model = prepare(fp32_model, quant_config=quant_config, example_inputs=example_inputs) run_fn(prepared_model) q_model = convert(prepared_model)...

kyang-06

Coding error！！

I want to complete the distillation of text similarity using the following script。 python run_glue_no_trainer_distillation.py \ --max_seq_length 128 --model_name_or_path ./student_model \ --teacher_model_name_or_path BAAI/bge-small-zh-v1.5 --do_distillation \ --per_device_train_batch_size 16 --learning_rate 1e-5 --num_train_epochs...

AheadSnail

Qwen/Qwen2.5-7B-Instruct model layer_wise_quant function error

my code: ```python class NewDataloader: def __init__(self, batch_size, **kwargs): self.batch_size = batch_size def __iter__(self): yield torch.tensor([1986, 374, 279, 2086, 11652, 13, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643,...

hadoop2xu

Failed to save quantized model

9

I tried to run this example https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/cv/static_quant/main.py, and I got an error in https://github.com/intel/neural-compressor/blob/09d4f2d6fb1a6aa91874a0b87a967067800462cb/examples/3.x_api/pytorch/cv/static_quant/main.py#L220 error message: ``` q_model.save(example_inputs=example_inputs, output_dir="./saved_results") File "C:\Users\JackTC_Li\AppData\Roaming\Python\Python38\site-packages\neural_compressor\torch\algorithms\pt2e_quant\save_load.py", line 38, in save quantized_ep = torch.export.export(model, example_inputs, dynamic_shapes=dynamic_shapes)...

lockeregg

how to evaluate AWQ ?

7

https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples how to set eval_func? https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/run_clm_no_trainer.py it seems no AWQ quantization, just RTN , GPTQ . and as readme.md said, weight-only id fake quantization, why save qmodel (user_model.save(args.output_dir) )?

chunniunai220ml

Quantization failed

1

https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/weight_only bash run_quant.sh --input_model=./Meta-Llama-3.1-8B --output_model=./Meta-Llama-3.1-8B_AWQ --batch_size=1 --dataset=NeelNanda/pile-10k --tokenizer=meta-llama/Meta-Llama-3.1-8B --algorithm=AWQ ``` /usr/local/lib/python3.10/dist-packages/diffusers/models/vq_model.py:20: FutureWarning: `VQEncoderOutput` is deprecated and will be removed in version 0.31. Importing `VQEncoderOutput` from `diffusers.models.vq_model` is deprecated and this...

endomorphosis

Any example to quantise a text embedding model on Intel Gaudi2?

2

I was looking for example or documentation how I can load or quantise both a HF embedding model on Intel Gaudi2. is there any examples available? I don't want to...

sleepingcat4

aitce

Error in fp8 quantization: Invalid scale factor : 1.70e+06, make sure the scale is not larger than : 6.55e+04

When I use this config to quantize a YOLOv3 model into fp8： `version: 1.0 model: # mandatory. used to specify model specific information. name: yolo_v3 framework: pytorch # mandatory. possible...

yyChen233

Handle module names from Dynamo compiler in FP8 Quantizer

4

## Type of Change - Bug Fix, with No API change. ## Description - The Measure component generates stats with original module names from the model, as opposed to the...

sandeep-maddipatla

Avoid using the onnx.mapping module

1

Usage in https://github.com/intel/neural-compressor/blob/4eaef0fab6c738bb461742fc0e920e66638dc84e/neural_compressor/adaptor/ox_utils/quantizer.py#L969-L978. It is deprecated. Use `helper.tensor_dtype_to_np_dtype` instead.

justinchuby

neural-compressor
neural-compressor copied to clipboard

Metadata

AssertionError of act_observer when using SmoothQuant for Llama-13b

Coding error！！

Qwen/Qwen2.5-7B-Instruct model layer_wise_quant function error

Failed to save quantized model

how to evaluate AWQ ?

Quantization failed

Any example to quantise a text embedding model on Intel Gaudi2?

Error in fp8 quantization: Invalid scale factor : 1.70e+06, make sure the scale is not larger than : 6.55e+04

Handle module names from Dynamo compiler in FP8 Quantizer

Avoid using the onnx.mapping module

← Metadata

Owner

Metadata

neural-compressor neural-compressor copied to clipboard

Metadata

← Metadata

Owner

Metadata

neural-compressor
neural-compressor copied to clipboard