TensorRT-LLM [Feature request] kv cache reuse policy feature request

[Feature request] kv cache reuse policy feature request

Open akhoroshev opened this issue 9 months ago • 1 comments

According to the docs reusable blocks are evicted based on LRU.

LRU is good approach. But I know that for some queries (promts) they won't be reused and I want these queries not to remain in the cache and not crowd out the queries I need.

I think this can be easily implemented by adding "priority" to the request.

May 23 '24 08:05 akhoroshev

TensorRT-LLM TensorRT-LLM copied to clipboard

[Feature request] kv cache reuse policy feature request

TensorRT-LLM
TensorRT-LLM copied to clipboard