neural-compressor how to evaluate AWQ ?

https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples

how to set eval_func?

https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/run_clm_no_trainer.py

it seems no AWQ quantization, just RTN , GPTQ . and as readme.md said, weight-only id fake quantization, why save qmodel (user_model.save(args.output_dir) )?

Aug 14 '24 11:08 chunniunai220ml

Hello, @chunniunai220ml Thanks for your interest in Intel(R) Neural Compressor. https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples This document describes the 2. x API. 2.x example link is https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm

Aug 15 '24 08:08 Kaihui-intel

Hello, @chunniunai220ml Thanks for your interest in Intel(R) Neural Compressor. https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples This document describes the 2. x API. 2.x example link is https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm

Thank for your reply, i followed 2.x example link , bash script as follow: python -u run_clm_no_trainer.py
--model $model_path
--dataset ${DATASET_NAME}
--approach weight-only
--output_dir ${tuned_checkpoint}
--quantize
--batch_size ${batch_size}
--woq_algo AWQ
--calib_iters 128
--woq_group_size 128
--woq_bits 4
--tasks hellaswag
--accuracy https://github.com/intel/neural-compressor/blob/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/run_clm_no_trainer.py#L355, it seems just evaluate original model instead of qmodel. if i want to evaluate qmodel, can i just modify #L355 as q_model.eval() eval_args = LMEvalParser( model="hf", user_model=q_model, #user_model, tokenizer=tokenizer, batch_size=args.batch_size, tasks=args.tasks,}

as readme.md said, Weight-only quantization based on fake quantization, why save qmodel in #L338? i think the qmodel weights dtype is not INT4 in storage. and the run_clm_no_trainer.py only supprt cpu, where is muti-GPU supported codes?

Aug 15 '24 13:08 chunniunai220ml

sure, the q_model need to export a compressed model https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#export-compressed-model

you can refer to https://github.com/intel/intel-extension-for-transformers/tree/v1.5/examples/huggingface/pytorch/text-generation/quantization v1.5 to quantize int4 model, it has integrated this export compressed model. It also includes GPU scripts.

3.x API is stay-tuned.

Aug 15 '24 13:08 Kaihui-intel

sure, the q_model need to export a compressed model https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#export-compressed-model

you can refer to https://github.com/intel/intel-extension-for-transformers/tree/v1.5/examples/huggingface/pytorch/text-generation/quantization v1.5 to quantize int4 model, it has integrated this export compressed model. It also includes GPU scripts.

3.x API is stay-tuned.

does it works well on nvidia V100? the readme,md seems only describe intel-gpu installation

besides, when run on CPU, it's stranged that the codes always killed for no reason after processing several blocks

Aug 15 '24 15:08 chunniunai220ml

I suggest you try using 3.x api, q_model is the export compressed model.

We will soon update the example of 3. x, which supports detection of auto-device. https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg But we haven't tested the performance on nv GPUs.

on dev branch: https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only

Aug 16 '24 03:08 Kaihui-intel

I suggest you try using 3.x api, q_model is the export compressed model.

We will soon update the example of 3. x, which supports detection of auto-device. https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg But we haven't tested the performance on nv GPUs.

on dev branch: https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only

i git kaihui/woq_3x_eg branch , and run : CUDA_VISIBLE_DEVICES="2" python run_clm_no_trainer.py
--model $model_path
--woq_algo AWQ
--woq_bits 4
--woq_group_size 128
--calib_iters 128
--woq_scheme asym
--quantize
--batch_size 1
--tasks wikitext
--accuracy AutoModelForCausalLM.from_pretrained(debice='cuda') neural-compressor/neural_compressor/torch/algorithms/weight_only/awq.py line 240, in block_calibration: model(*args, **kwargs),the inputs device is cpu, so bug reported: : Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

but another bug in eval: from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser File "/*/anaconda3/lib/python3.11/site-packages/intel_extension_for_transformers/transformers/init.py", line 19, in from .config import ( File "/8/anaconda3/lib/python3.11/site-packages/intel_extension_for_transformers/transformers/config.py", line 21, in from neural_compressor.conf.config import ( ModuleNotFoundError: No module named 'neural_compressor.conf'

and, how to load saved_results/quantmodel.pt to evaluate?

Aug 16 '24 06:08 chunniunai220ml

Hi, @chunniunai220ml, try with the old version like 2.6 may solve this issue: ModuleNotFoundError: No module named 'neural_compressor.conf'

Aug 29 '24 02:08 pengxin99

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Oct 29 '25 22:10 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

Nov 06 '25 22:11 github-actions[bot]

neural-compressor neural-compressor copied to clipboard

how to evaluate AWQ ?

neural-compressor
neural-compressor copied to clipboard