mlc-llm
mlc-llm copied to clipboard
[Question] Can PagedKVCache support different size of kvcache in different layers?
❓ General Questions
I have a pruned model which delete some qkv heads (32 to 24) in some layers (10 to 20) and I want to adjust the model code so that I can deploy the pruned model. But I have some problems about PagedKVCache, I don't know how to use it to give different kvcache size in different layers. And now I use nn.KVCache as a makeshift. I want to know how to use PagedKVCache to satisfy my need. thank very much.
The current version does not support it yet and it might be hard to modify for it.
hi, @Hzfengsy , so if I want to deploy my pruned model, now the only solution is use nn.KVCache to replace the PagedKVCache as the old commit did? Or if I can create more than one PagedKVCache and use different PagedKVCache in different layers? Could you give me some suggestion? thank you for reply.
KV cache is a common interface, the solution right now would be to create a difference instance of kv cache implementation of the same interfaceand replace it
@BenchuYee hi, I want to adjust the KVCache for more flexible usage, which old commit did you use to build the nn.KVCache model? and BTW, do you guys observe obvious performance drop using nn.KVCache instead of PagedKVCache (no batch requirest considered)?
@DeclK hi, sorry for the late reply, I use the old commit #1746 on Feb 14 and it works many commits ago(before May), I just use the old commit to replace the new model code. We don't test the performance, but it can communicate normally. But now it seems this idea can't work in the new commit. If you want to use this, I suggeest you use the old commit(before May) and replace the model code with the older commit(about #1746 on Feb 14)