nano-vllm
nano-vllm copied to clipboard
Fix: can_append function returns incorrect result
The can_append function in the BlockManager returns a boolean that indicates whether we can store a sampled token for the given sequence. Currently, the code snippet len(seq) % self.block_size == 1 is wrong. Considering a block_size of 16, we currently have 17 tokens. According to the current code logic, it would return 1, indicating that we need at least one additional block for storage. However, in reality, we do not need an extra block—an additional block is required only when the length is an exact multiple of the block size.