intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
Removed fallback for lm_head op
Type of Change
feature No API changed
Description
Removed fallback of lm_head op for WOQ
Expected Behavior & Potential Risk
Don't fallback lm_head when weight-only quantization.
How has this PR been tested?
Local tested
⛈️ Required checks status: Has failure 🔴
Warning If you do not have the access to re-run the CI-Summary bot, please contact VincyZhang for help. If you push a new commit, all of the workflow will be re-triggered.
Groups summary
🟢 Format Scan Tests workflow
| Check ID | Status | Error details | |
|---|---|---|---|
| format-scan (pylint) | success | ✅ | |
| format-scan (bandit) | success | ✅ | |
| format-scan (cloc) | success | ✅ | |
| format-scan (cpplint) | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.
🔴 Optimize Unit Test workflow
| Check ID | Status | Error details | |
|---|---|---|---|
| optimize-unit-test-baseline | success | ✅ | |
| optimize-unit-test-PR-test | failure | download | ❌ |
| Genreate-OptimizeUT-Report | skipped | ❓ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.
🟢 NeuralChat Unit Test
| Check ID | Status | Error details | |
|---|---|---|---|
| neuralchat-unit-test-baseline | success | ✅ | |
| neuralchat-unit-test-PR-test | success | ✅ | |
| Generate-NeuralChat-Report | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.
🟢 Engine Unit Test workflow
| Check ID | Status | Error details | |
|---|---|---|---|
| engine-unit-test-baseline | success | ✅ | |
| engine-unit-test-PR-test | success | ✅ | |
| Genreate-Engine-Report | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.
🟡 Chat Bot Test workflow
| Check ID | Status | Error details | |
|---|---|---|---|
| call-inference-llama-2-7b-chat-hf / inference test | queued | ⌛ | |
| call-inference-mpt-7b-chat / inference test | queued | ⌛ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.
Thank you for your contribution! 💜
Note This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.