intel-extension-for-transformers icon indicating copy to clipboard operation
intel-extension-for-transformers copied to clipboard

Removed fallback for lm_head op

Open PenghuiCheng opened this issue 10 months ago • 1 comments

Type of Change

feature No API changed

Description

Removed fallback of lm_head op for WOQ

Expected Behavior & Potential Risk

Don't fallback lm_head when weight-only quantization.

How has this PR been tested?

Local tested

PenghuiCheng avatar Apr 15 '24 02:04 PenghuiCheng

⛈️ Required checks status: Has failure 🔴

Warning If you do not have the access to re-run the CI-Summary bot, please contact VincyZhang for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🟢 Format Scan Tests workflow
Check ID Status Error details
format-scan (pylint) success
format-scan (bandit) success
format-scan (cloc) success
format-scan (cpplint) success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🔴 Optimize Unit Test workflow
Check ID Status Error details
optimize-unit-test-baseline success
optimize-unit-test-PR-test failure download
Genreate-OptimizeUT-Report skipped

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🟢 NeuralChat Unit Test
Check ID Status Error details
neuralchat-unit-test-baseline success
neuralchat-unit-test-PR-test success
Generate-NeuralChat-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🟢 Engine Unit Test workflow
Check ID Status Error details
engine-unit-test-baseline success
engine-unit-test-PR-test success
Genreate-Engine-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🟡 Chat Bot Test workflow
Check ID Status Error details
call-inference-llama-2-7b-chat-hf / inference test queued
call-inference-mpt-7b-chat / inference test queued

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.


Thank you for your contribution! 💜

Note This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

github-actions[bot] avatar Apr 15 '24 02:04 github-actions[bot]