vllm
vllm copied to clipboard
[Feature]: Integrate `flash-infer` FP8 KV Cache Chunked-Prefill (Append Attention)
🚀 The feature, motivation and pitch
From new Flash Infer Release https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.1.4
cc @comaniac
Additional context
Follow up to: https://github.com/vllm-project/vllm/pull/7208, https://github.com/vllm-project/vllm/pull/7185