nano-vllm Optimize block management in decode phase

Optimize block management in decode phase

Open xiaohajiayou opened this issue 5 months ago • 2 comments

In #71 #66 #65 #30 , there were questions about the timing of applying can_append and may_append for requesting new blocks. This PR will separate the logic for appending new blocks when the block is just filled, and the hash check when the block is not fully filled, in order to improve readability. Key Changes:

Call check_and_update_hash before processing each sequence
Replace may_append with append for clarity
Simplify conditional logic for better readability

(Addresses: Decouple block management and hash computation)

Jul 04 '25 07:07 xiaohajiayou

nano-vllm nano-vllm copied to clipboard

Optimize block management in decode phase

nano-vllm
nano-vllm copied to clipboard