mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Question] Can PagedKVCache support different size of kvcache in different layers?

Open BenchuYee opened this issue 1 year ago • 5 comments

❓ General Questions

I have a pruned model which delete some qkv heads (32 to 24) in some layers (10 to 20) and I want to adjust the model code so that I can deploy the pruned model. But I have some problems about PagedKVCache, I don't know how to use it to give different kvcache size in different layers. And now I use nn.KVCache as a makeshift. I want to know how to use PagedKVCache to satisfy my need. thank very much.

BenchuYee avatar Apr 22 '24 13:04 BenchuYee

The current version does not support it yet and it might be hard to modify for it.

Hzfengsy avatar Apr 24 '24 02:04 Hzfengsy

hi, @Hzfengsy , so if I want to deploy my pruned model, now the only solution is use nn.KVCache to replace the PagedKVCache as the old commit did? Or if I can create more than one PagedKVCache and use different PagedKVCache in different layers? Could you give me some suggestion? thank you for reply.

BenchuYee avatar Apr 24 '24 13:04 BenchuYee

KV cache is a common interface, the solution right now would be to create a difference instance of kv cache implementation of the same interfaceand replace it

tqchen avatar Apr 24 '24 13:04 tqchen

@BenchuYee hi, I want to adjust the KVCache for more flexible usage, which old commit did you use to build the nn.KVCache model? and BTW, do you guys observe obvious performance drop using nn.KVCache instead of PagedKVCache (no batch requirest considered)?

DeclK avatar May 11 '24 07:05 DeclK

@DeclK hi, sorry for the late reply, I use the old commit #1746 on Feb 14 and it works many commits ago(before May), I just use the old commit to replace the new model code. We don't test the performance, but it can communicate normally. But now it seems this idea can't work in the new commit. If you want to use this, I suggeest you use the old commit(before May) and replace the model code with the older commit(about #1746 on Feb 14)

BenchuYee avatar Jun 10 '24 07:06 BenchuYee