tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

tensortllm backend fails when kv cache is disabled

Open ShuaiShao93 opened this issue 11 months ago • 5 comments

Description Error

model_instance_state.cc:1117] "Failed updating TRT LLM statistics: Internal - Failed to find Max KV cache blocks in metrics."

when kv cache is disabled when building the trtllm engine. And inflight_fused_batching batcher doesn't work probably because of this issue.

Triton Information What version of Triton are you using? 24.09

Are you using the Triton container or did you build it yourself? Triton container

To Reproduce

  1. build trtllm engine with trtllm-build --kv_cache_type=disabled
  2. load the model in triton with batching_strategy:inflight_fused_batching
  3. run inference with batched data. We can see the error and the batch size is always 1.

Expected behavior triton should work when kv cache is disabled

ShuaiShao93 avatar Nov 13 '24 20:11 ShuaiShao93