optimum-intel
optimum-intel copied to clipboard
Add INC config and quantization.
What does this PR do?
the PR is used to improve weight only quantization config and quantize api with INC 3.0 API. the status is WIP.
- remove the config imported from intel_extension_for_transformers and define
INCWeightQuantizationConfig
in optimum-intel. - upstream the
convert_to_quantized_model
from intel_extension_for_transformers to optimum-intel, althrough still import the intel_extension_for_transformers, we are working for push kernel code to ipex, when the kernel part code merge to ipex, we will replace it from ipex. - upstream the
save_low_bit
from intel_extension_for_transformers to opitmum-intel.
the INC release is coming soon, so raise the PR first, once the INC 3.0 release, we plan to push it to merge.
@echarlaix Here are 2 proposals, the following code is example code, Could you have some comments? Which one is you perfer?
usage example:
the optimum-intel current usage
# quantize
from optimum.intel.neural_compressor import INCModelForCaudalLM, INCQuantizer
from intel_extension_for_transformers.transformers import GPTQConfig
quantization_config = GPTQConfig(tokenizer=tokenizer_name, dataset=dataset_name)
quantizer = INCQuantizer.from_pretrained(model)
quantizer.quantize(
quantization_config=quantization_config,
save_directory=training_args.output_dir
)
# loading
model = INCModelForCaudalLM.from_pretrained(training_args.output_dir)
INC proposal 1:
# quantize
from optimum.intel.neural_compressor import INCModelForCaudalLM, GPTQConfig
quantization_config = GPTQConfig(tokenizer=tokenizer_name, dataset=dataset_name)
model = INCModelForCaudalLM.from_pretrained(model_name_or_path, quantization_config=quantization_config)
model.save_pretrained(“output_dir”)
# loading
model = INCModelForCaudalLM.from_pretrained(“output_dir”)
INC proposal 2:
# quantize
from optimum.intel.neural_compressor import INCModelForCaudalLM, INCWeightQuantizationConfig
quantization_config = INCWeightQuantizationConfig(quant_method="GPTQ")
model = INCModelForCaudalLM.from_pretrained(model_name_or_path, quantization_config=quantization_config)
model.save_pretrained(“output_dir”)
# loading
model = INCModelForCaudalLM.from_pretrained(“output_dir”)
cc @PenghuiCheng @thuang6 @ftian1
Fixes # (issue)
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?