vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching

Open llsj14 opened this issue 5 months ago • 2 comments

Summary

Block Manager v2, unlike v1, did not support LoRA and prompt adapter for the block hash in prefix caching mode. I added logic to inject the LoRA ID and prompt adapter ID into the block hash function to support LoRA and prompt adapter while using prefix caching mode with block manager v2.

Detail

Block Manager v1 uses the following hash_of_block function to generate a content hash in prefix caching mode: https://github.com/vllm-project/vllm/blob/baa5467547a758af35f442af6edfbc0fb73c83ce/vllm/sequence.py#L460-L468

However, Block Manager v2 only uses token IDs, as shown here: https://github.com/vllm-project/vllm/blob/baa5467547a758af35f442af6edfbc0fb73c83ce/vllm/core/block/prefix_caching_block.py#L855

llsj14 avatar Sep 06 '24 14:09 llsj14