nano-vllm
nano-vllm copied to clipboard
Optimize block management in decode phase
In #71 #66 #65 #30 , there were questions about the timing of applying can_append and may_append for requesting new blocks. This PR will separate the logic for appending new blocks when the block is just filled, and the hash check when the block is not fully filled, in order to improve readability.
Key Changes:
- Call
check_and_update_hashbefore processing each sequence - Replace
may_appendwithappendfor clarity - Simplify conditional logic for better readability
(Addresses: Decouple block management and hash computation)