vllm
vllm copied to clipboard
[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching
Summary
Block Manager v2, unlike v1, did not support LoRA and prompt adapter for the block hash in prefix caching mode.
I added logic to inject the LoRA ID and prompt adapter ID into the block hash function to support LoRA and prompt adapter while using prefix caching mode
with block manager v2
.
Detail
Block Manager v1 uses the following hash_of_block function to generate a content hash in prefix caching mode: https://github.com/vllm-project/vllm/blob/baa5467547a758af35f442af6edfbc0fb73c83ce/vllm/sequence.py#L460-L468
However, Block Manager v2 only uses token IDs, as shown here: https://github.com/vllm-project/vllm/blob/baa5467547a758af35f442af6edfbc0fb73c83ce/vllm/core/block/prefix_caching_block.py#L855