nano-vllm
nano-vllm copied to clipboard
Support for Chunked Prefill of new seqs while decoding old seqs
Hi,
For the sake of improved serving throughput - utilization and not compromising TTFT, is there the support for chunked-prefilling i.e, the engine step supporting the forward pass with chunked input of new seqs during prefilling phase while the rest of seqs in the same batch are in decoding phase ?