intel-extension-for-transformers icon indicating copy to clipboard operation
intel-extension-for-transformers copied to clipboard

h2o for kv cache compression

Open n1ck-guo opened this issue 10 months ago • 1 comments

Type of Change

feature

Description

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models paper

NTD

  • [x] example
  • [ ] refactor code to same style
  • [ ] add seq len api
  • [ ] support for more models, both sim and real

Expected Behavior & Potential Risk

None

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

n1ck-guo avatar Apr 10 '24 07:04 n1ck-guo

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow
Check ID Status Error details
format-scan (pylint) success
format-scan (bandit) success
format-scan (cloc) success
format-scan (cpplint) success

These checks are required after the changes to intel_extension_for_transformers/transformers/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_gaudi_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/base.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/h2o.py.

🟢 Optimize Unit Test workflow
Check ID Status Error details
optimize-unit-test-baseline success
optimize-unit-test-PR-test success
Genreate-OptimizeUT-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_gaudi_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/base.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/h2o.py.

🟢 Engine Unit Test workflow
Check ID Status Error details
engine-unit-test-baseline success
engine-unit-test-PR-test success
Genreate-Engine-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/kv_cache_compression/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_gaudi_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/models/modeling_llama.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/__init__.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/base.py, intel_extension_for_transformers/transformers/kv_cache_compression/prune/h2o.py.


Thank you for your contribution! 💜

Note This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

github-actions[bot] avatar Apr 10 '24 07:04 github-actions[bot]

pre-commit.ci autofix

n1ck-guo avatar May 21 '24 07:05 n1ck-guo

Could we add a document introducing what h2o is?

PenghuiCheng avatar Jul 02 '24 05:07 PenghuiCheng

format scan improved by https://github.com/intel/intel-extension-for-transformers/pull/1647. merged.

changwangss avatar Jul 02 '24 09:07 changwangss

Could we add a document introducing what h2o is?

add in the example/readme

n1ck-guo avatar Jul 15 '24 02:07 n1ck-guo