nano-vllm Support for Chunked Prefill of new seqs while decoding old seqs

Support for Chunked Prefill of new seqs while decoding old seqs

Open ved27 opened this issue 5 months ago • 4 comments

Hi,

For the sake of improved serving throughput - utilization and not compromising TTFT, is there the support for chunked-prefilling i.e, the engine step supporting the forward pass with chunked input of new seqs during prefilling phase while the rest of seqs in the same batch are in decoding phase ?

Jul 08 '25 02:07 ved27

nano-vllm nano-vllm copied to clipboard

Support for Chunked Prefill of new seqs while decoding old seqs

nano-vllm
nano-vllm copied to clipboard