Zhenyu (Allen) Zhang

Results 1 issues of Zhenyu (Allen) Zhang

This is add the implementation of H2O algorithm for efficient long context inference of Llama models. Current implementations are based on the Huggingface transformers and tests on summarization tasks, including...

cla signed