nano-vllm icon indicating copy to clipboard operation
nano-vllm copied to clipboard

Support for Chunked Prefill of new seqs while decoding old seqs

Open ved27 opened this issue 5 months ago • 4 comments

Hi,

For the sake of improved serving throughput - utilization and not compromising TTFT, is there the support for chunked-prefilling i.e, the engine step supporting the forward pass with chunked input of new seqs during prefilling phase while the rest of seqs in the same batch are in decoding phase ?

ved27 avatar Jul 08 '25 02:07 ved27