intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
h2o for kv cache compression
Type of Change
feature
Description
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models paper
NTD
- [x] example
- [ ] refactor code to same style
- [ ] add seq len api
- [ ] support for more models, both sim and real
Expected Behavior & Potential Risk
None
How has this PR been tested?
how to reproduce the test (including hardware information)
Dependency Change?
any library dependency introduced or removed
⚡ Required checks status: All passing 🟢
Groups summary
🟢 Format Scan Tests workflow
Check ID | Status | Error details | |
---|---|---|---|
format-scan (pylint) | success | ✅ | |
format-scan (bandit) | success | ✅ | |
format-scan (cloc) | success | ✅ | |
format-scan (cpplint) | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/kv_cache_compression/__init__.py
, intel_extension_for_transformers/transformers/kv_cache_compression/models/__init__.py
, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_gaudi_llama.py
, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_llama.py
, intel_extension_for_transformers/transformers/kv_cache_compression/prune/__init__.py
, intel_extension_for_transformers/transformers/kv_cache_compression/prune/base.py
, intel_extension_for_transformers/transformers/kv_cache_compression/prune/h2o.py
.
🟢 Optimize Unit Test workflow
Check ID | Status | Error details | |
---|---|---|---|
optimize-unit-test-baseline | success | ✅ | |
optimize-unit-test-PR-test | success | ✅ | |
Genreate-OptimizeUT-Report | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/kv_cache_compression/__init__.py
, intel_extension_for_transformers/transformers/kv_cache_compression/models/__init__.py
, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_gaudi_llama.py
, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_llama.py
, intel_extension_for_transformers/transformers/kv_cache_compression/prune/__init__.py
, intel_extension_for_transformers/transformers/kv_cache_compression/prune/base.py
, intel_extension_for_transformers/transformers/kv_cache_compression/prune/h2o.py
.
🟢 Engine Unit Test workflow
Check ID | Status | Error details | |
---|---|---|---|
engine-unit-test-baseline | success | ✅ | |
engine-unit-test-PR-test | success | ✅ | |
Genreate-Engine-Report | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/kv_cache_compression/__init__.py
, intel_extension_for_transformers/transformers/kv_cache_compression/models/__init__.py
, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_gaudi_llama.py
, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_llama.py
, intel_extension_for_transformers/transformers/kv_cache_compression/prune/__init__.py
, intel_extension_for_transformers/transformers/kv_cache_compression/prune/base.py
, intel_extension_for_transformers/transformers/kv_cache_compression/prune/h2o.py
.
Thank you for your contribution! 💜
Note This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.
pre-commit.ci autofix
Could we add a document introducing what h2o is?
format scan improved by https://github.com/intel/intel-extension-for-transformers/pull/1647. merged.
Could we add a document introducing what h2o is?
add in the example/readme