[FEAT] Any plans to support TensortRT-LLM?
Are there any plans to provide support for the TRTLLM-Serve framework in the future? TRTLM-Serve is a high-performance LLM inference framework that is used in many production environments, like VLLM and SGLang. KVCached is useful for resolving long-tail service issues in such environments. We would greatly appreciate your support for TRTLM-Serve.
Hi @troycheng, we definitely want to support as many inference enignes as possible, including TRTLLM-Serve. We are currently a bit back of hands on supporting different features. Would you be interested in or do you know some people who are interested in integrated kvcached with TRTLLM-Serve? Thanks!
We're not very familiar with this but we'll have a try. If any questions come up, we'll post them here.
Thank you very much! We'd happy to provide any needed help. Just let us know.
@shpgy-shpgy @Nekofish-L As discussed this offline and we can have a try. Let's take the first step toward solving the long-tail service issue.