neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Results 155 neural-compressor issues
Sort by recently updated
recently updated
newest added

When I tried smoothquant with sample code clip ``` from neural_compressor.torch.quantization import SmoothQuantConfig, convert, prepare def run_fn(model): model(example_inputs) quant_config = SmoothQuantConfig(alpha=0.5) prepared_model = prepare(fp32_model, quant_config=quant_config, example_inputs=example_inputs) run_fn(prepared_model) q_model = convert(prepared_model)...

I want to complete the distillation of text similarity using the following script。 python run_glue_no_trainer_distillation.py \ --max_seq_length 128 --model_name_or_path ./student_model \ --teacher_model_name_or_path BAAI/bge-small-zh-v1.5 --do_distillation \ --per_device_train_batch_size 16 --learning_rate 1e-5 --num_train_epochs...

my code: ```python class NewDataloader: def __init__(self, batch_size, **kwargs): self.batch_size = batch_size def __iter__(self): yield torch.tensor([1986, 374, 279, 2086, 11652, 13, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643,...

I tried to run this example https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/cv/static_quant/main.py, and I got an error in https://github.com/intel/neural-compressor/blob/09d4f2d6fb1a6aa91874a0b87a967067800462cb/examples/3.x_api/pytorch/cv/static_quant/main.py#L220 error message: ``` q_model.save(example_inputs=example_inputs, output_dir="./saved_results") File "C:\Users\JackTC_Li\AppData\Roaming\Python\Python38\site-packages\neural_compressor\torch\algorithms\pt2e_quant\save_load.py", line 38, in save quantized_ep = torch.export.export(model, example_inputs, dynamic_shapes=dynamic_shapes)...

https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples how to set eval_func? https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/run_clm_no_trainer.py it seems no AWQ quantization, just RTN , GPTQ . and as readme.md said, weight-only id fake quantization, why save qmodel (user_model.save(args.output_dir) )?

https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/weight_only bash run_quant.sh --input_model=./Meta-Llama-3.1-8B --output_model=./Meta-Llama-3.1-8B_AWQ --batch_size=1 --dataset=NeelNanda/pile-10k --tokenizer=meta-llama/Meta-Llama-3.1-8B --algorithm=AWQ ``` /usr/local/lib/python3.10/dist-packages/diffusers/models/vq_model.py:20: FutureWarning: `VQEncoderOutput` is deprecated and will be removed in version 0.31. Importing `VQEncoderOutput` from `diffusers.models.vq_model` is deprecated and this...

I was looking for example or documentation how I can load or quantise both a HF embedding model on Intel Gaudi2. is there any examples available? I don't want to...

aitce

When I use this config to quantize a YOLOv3 model into fp8: `version: 1.0 model: # mandatory. used to specify model specific information. name: yolo_v3 framework: pytorch # mandatory. possible...

## Type of Change - Bug Fix, with No API change. ## Description - The Measure component generates stats with original module names from the model, as opposed to the...

Usage in https://github.com/intel/neural-compressor/blob/4eaef0fab6c738bb461742fc0e920e66638dc84e/neural_compressor/adaptor/ox_utils/quantizer.py#L969-L978. It is deprecated. Use `helper.tensor_dtype_to_np_dtype` instead.