neural-compressor
neural-compressor copied to clipboard
support habana FP8 per channel quantization
Type of Change
feature
Description
- [x] support per channel quantization for higher accuracy
- [x] add observer registry for easy extension
- [x] dump scale_inv from observer to align with Habana Quantization Toolkit
- [x] move observer device to cpu to avoid program hangs
Expected Behavior & Potential Risk
UT pass
How has this PR been tested?
local test
⛈️ Required checks status: Has failure 🔴
Warning If you do not have the access to re-run the Probot, please contact XuehaoSun for help. If you push a new commit, all of the workflow will be re-triggered.
Groups summary
🟢 Code Scan Tests workflow
| Check ID | Status | Error details | |
|---|---|---|---|
| Code-Scan | success | ✅ | |
| Code-Scan (Bandit Code Scan Bandit) | success | ✅ | |
| Code-Scan (DocStyle Code Scan DocStyle) | success | ✅ | |
| Code-Scan (Pylint Code Scan Pylint) | success | ✅ |
These checks are required after the changes to neural_compressor/torch/algorithms/habana_fp8/fp8_quant.py, neural_compressor/torch/algorithms/habana_fp8/modules.py, neural_compressor/torch/algorithms/habana_fp8/observer.py, neural_compressor/torch/algorithms/habana_fp8/save_load.py, neural_compressor/torch/algorithms/habana_fp8/scale.py, neural_compressor/torch/amp/fp8/functions.py, neural_compressor/torch/quantization/config.py.
🔴 Unit Tests 3x-PyTorch workflow
| Check ID | Status | Error details | |
|---|---|---|---|
| UT-3x-Torch | failure | ❌ | |
| UT-3x-Torch (Coverage Compare CollectDatafiles) | no_status | ❓ | |
| UT-3x-Torch (Unit Test 3x Torch Unit Test 3x Torch) | success | ✅ | |
| UT-3x-Torch (Unit Test 3x Torch baseline Unit Test 3x Torch baseline) | failure | download | ❌ |
These checks are required after the changes to neural_compressor/torch/algorithms/habana_fp8/fp8_quant.py, neural_compressor/torch/algorithms/habana_fp8/modules.py, neural_compressor/torch/algorithms/habana_fp8/observer.py, neural_compressor/torch/algorithms/habana_fp8/save_load.py, neural_compressor/torch/algorithms/habana_fp8/scale.py, neural_compressor/torch/amp/fp8/functions.py, neural_compressor/torch/quantization/config.py, test/3x/torch/quantization/habana_fp8/test_fp8.py.
Thank you for your contribution! 💜
Note This comment is automatically generated and updates for 360 minutes every 180 seconds. If you have any other questions, contact chensuyue or XuehaoSun for help.
Will add UTs later
local test result: 12 passed