neural-compressor
neural-compressor copied to clipboard
Support mixed `INT8` + `FP16` in one model
Type of Change
feature API changed or not
Description
- [x] Support convert unquantized
linearintofp16 - [ ] Extend the fp16 ops list to align with https://pytorch.org/docs/stable/amp.html#cpu-ops-that-can-autocast-to-bfloat16 in a separate PR
Usage
model = export(model, example_inputs=example_inputs)
quant_config = get_default_static_config()
quant_config.set_local(torch.nn.Linear, StaticQuantConfig(w_dtype="fp16", act_dtype="fp16"))
# prepare
prepare_model = prepare(model, quant_config)
# calibrate
for i in range(2):
prepare_model(*example_inputs)
# convert
converted_model = convert(prepare_model)
Expected Behavior & Potential Risk
How has this PR been tested?
Pre-CI Some extension test later.
Dependency Change?
expecttest
⚡ Required checks status: All passing 🟢
Groups summary
🟢 Code Scan Tests workflow
| Check ID | Status | Error details | |
|---|---|---|---|
| Code-Scan | success | ✅ | |
| Code-Scan (Bandit Code Scan Bandit) | success | ✅ | |
| Code-Scan (DocStyle Code Scan DocStyle) | success | ✅ | |
| Code-Scan (Pylint Code Scan Pylint) | success | ✅ |
These checks are required after the changes to neural_compressor/torch/algorithms/pt2e_quant/core.py, neural_compressor/torch/algorithms/pt2e_quant/half_precision_rewriter.py, neural_compressor/torch/utils/utility.py.
🟢 Model Tests 3x workflow
| Check ID | Status | Error details | |
|---|---|---|---|
| Model-Test-3x | success | ✅ | |
| Model-Test-3x (Generate Report GenerateReport) | success | ✅ | |
| Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4) | success | ✅ | |
| Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_bnb) | success | ✅ | |
| Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_ggml) | success | ✅ |
These checks are required after the changes to neural_compressor/torch/algorithms/pt2e_quant/core.py, neural_compressor/torch/algorithms/pt2e_quant/half_precision_rewriter.py, neural_compressor/torch/utils/utility.py.
🟢 Unit Tests 3x-PyTorch workflow
| Check ID | Status | Error details | |
|---|---|---|---|
| UT-3x-Torch | success | ✅ | |
| UT-3x-Torch (Coverage Compare CollectDatafiles) | success | ✅ | |
| UT-3x-Torch (Unit Test 3x Torch Unit Test 3x Torch) | success | ✅ | |
| UT-3x-Torch (Unit Test 3x Torch baseline Unit Test 3x Torch baseline) | success | ✅ |
These checks are required after the changes to neural_compressor/torch/algorithms/pt2e_quant/core.py, neural_compressor/torch/algorithms/pt2e_quant/half_precision_rewriter.py, neural_compressor/torch/utils/utility.py, test/3x/torch/algorithms/pt2e_quant/test_half_precision_rewriter.py, test/3x/torch/quantization/test_pt2e_quant.py, test/3x/torch/requirements.txt.
Thank you for your contribution! 💜
Note This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact chensuyue or XuehaoSun for help.
Shall we add logger.info to teach user that ipex doesn't support fp16?
quant_config.set_local(torch.nn.Linear, StaticQuantConfig(w_dtype="fp16", act_dtype="fp16"))
Shall we add logger.info to teach user that ipex doesn't support fp16?
quant_config.set_local(torch.nn.Linear, StaticQuantConfig(w_dtype="fp16", act_dtype="fp16"))
Thanks for your suggestions, will refine it by a separate PR.