TensorRT-LLM
TensorRT-LLM copied to clipboard
[Feature request] kv cache reuse policy feature request
According to the docs reusable blocks are evicted based on LRU.
LRU is good approach. But I know that for some queries (promts) they won't be reused and I want these queries not to remain in the cache and not crowd out the queries I need.
I think this can be easily implemented by adding "priority" to the request.