nano-vllm icon indicating copy to clipboard operation
nano-vllm copied to clipboard

Optimize block management in decode phase

Open xiaohajiayou opened this issue 5 months ago • 2 comments

In #71 #66 #65 #30 , there were questions about the timing of applying can_append and may_append for requesting new blocks. This PR will separate the logic for appending new blocks when the block is just filled, and the hash check when the block is not fully filled, in order to improve readability. Key Changes:

  1. Call check_and_update_hash before processing each sequence
  2. Replace may_append with append for clarity
  3. Simplify conditional logic for better readability

(Addresses: Decouple block management and hash computation)

xiaohajiayou avatar Jul 04 '25 07:07 xiaohajiayou