Type of Change

feature

Description

[x] support per channel quantization for higher accuracy
[x] add observer registry for easy extension
[x] dump scale_inv from observer to align with Habana Quantization Toolkit
[x] move observer device to cpu to avoid program hangs

Expected Behavior & Potential Risk

UT pass

How has this PR been tested?

local test

Mar 11 '24 06:03 xin3he

⛈️ Required checks status: Has failure 🔴

Warning If you do not have the access to re-run the Probot, please contact XuehaoSun for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🟢 Code Scan Tests workflow

Check ID	Status
Code-Scan	success	✅
Code-Scan (Bandit Code Scan Bandit)	success	✅
Code-Scan (DocStyle Code Scan DocStyle)	success	✅
Code-Scan (Pylint Code Scan Pylint)	success	✅

These checks are required after the changes to neural_compressor/torch/algorithms/habana_fp8/fp8_quant.py, neural_compressor/torch/algorithms/habana_fp8/modules.py, neural_compressor/torch/algorithms/habana_fp8/observer.py, neural_compressor/torch/algorithms/habana_fp8/save_load.py, neural_compressor/torch/algorithms/habana_fp8/scale.py, neural_compressor/torch/amp/fp8/functions.py, neural_compressor/torch/quantization/config.py.

🔴 Unit Tests 3x-PyTorch workflow

Check ID	Status	Error details
UT-3x-Torch	failure		❌
UT-3x-Torch (Coverage Compare CollectDatafiles)	no_status		❓
UT-3x-Torch (Unit Test 3x Torch Unit Test 3x Torch)	success		✅
UT-3x-Torch (Unit Test 3x Torch baseline Unit Test 3x Torch baseline)	failure	download	❌

These checks are required after the changes to neural_compressor/torch/algorithms/habana_fp8/fp8_quant.py, neural_compressor/torch/algorithms/habana_fp8/modules.py, neural_compressor/torch/algorithms/habana_fp8/observer.py, neural_compressor/torch/algorithms/habana_fp8/save_load.py, neural_compressor/torch/algorithms/habana_fp8/scale.py, neural_compressor/torch/amp/fp8/functions.py, neural_compressor/torch/quantization/config.py, test/3x/torch/quantization/habana_fp8/test_fp8.py.

Thank you for your contribution! 💜

Note This comment is automatically generated and updates for 360 minutes every 180 seconds. If you have any other questions, contact chensuyue or XuehaoSun for help.

Mar 12 '24 02:03 github-actions[bot]

Will add UTs later

Mar 12 '24 06:03 xin3he

local test result: 12 passed

Mar 13 '24 12:03 xin3he

neural-compressor
neural-compressor copied to clipboard

support habana FP8 per channel quantization

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

⛈️ Required checks status: Has failure 🔴

Groups summary

neural-compressor neural-compressor copied to clipboard

support habana FP8 per channel quantization

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

⛈️ Required checks status: Has failure 🔴

Groups summary

neural-compressor
neural-compressor copied to clipboard