vllm
vllm copied to clipboard

Published 20 hours ago •

Reame
Issues

[Feature]: Integrate `flash-infer` FP8 KV Cache Chunked-Prefill (Append Attention)

Open jon-chuang opened this issue 1 year ago • 6 comments

🚀 The feature, motivation and pitch

From new Flash Infer Release https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.1.4

cc @comaniac

Additional context

Follow up to: https://github.com/vllm-project/vllm/pull/7208, https://github.com/vllm-project/vllm/pull/7185

Aug 13 '24 00:08 jon-chuang