intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
Removed fallback for lm_head op
Type of Change
feature No API changed
Description
Removed fallback of lm_head op for WOQ
Expected Behavior & Potential Risk
Don't fallback lm_head when weight-only quantization.
How has this PR been tested?
Local tested
⛈️ Required checks status: Has failure 🔴
Warning If you do not have the access to re-run the CI-Summary bot, please contact VincyZhang for help. If you push a new commit, all of the workflow will be re-triggered.
Groups summary
🟢 Format Scan Tests workflow
Check ID | Status | Error details | |
---|---|---|---|
format-scan (pylint) | success | ✅ | |
format-scan (bandit) | success | ✅ | |
format-scan (cloc) | success | ✅ | |
format-scan (cpplint) | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py
.
🔴 Optimize Unit Test workflow
Check ID | Status | Error details | |
---|---|---|---|
optimize-unit-test-baseline | success | ✅ | |
optimize-unit-test-PR-test | failure | download | ❌ |
Genreate-OptimizeUT-Report | skipped | ❓ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py
.
🟢 NeuralChat Unit Test
Check ID | Status | Error details | |
---|---|---|---|
neuralchat-unit-test-baseline | success | ✅ | |
neuralchat-unit-test-PR-test | success | ✅ | |
Generate-NeuralChat-Report | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py
.
🟢 Engine Unit Test workflow
Check ID | Status | Error details | |
---|---|---|---|
engine-unit-test-baseline | success | ✅ | |
engine-unit-test-PR-test | success | ✅ | |
Genreate-Engine-Report | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py
.
🟡 Chat Bot Test workflow
Check ID | Status | Error details | |
---|---|---|---|
call-inference-llama-2-7b-chat-hf / inference test | queued | ⌛ | |
call-inference-mpt-7b-chat / inference test | queued | ⌛ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py
.
Thank you for your contribution! 💜
Note This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.