tensorrtllm_backend
tensorrtllm_backend copied to clipboard
tensortllm backend fails when kv cache is disabled
Description Error
model_instance_state.cc:1117] "Failed updating TRT LLM statistics: Internal - Failed to find Max KV cache blocks in metrics."
when kv cache is disabled when building the trtllm engine. And inflight_fused_batching batcher doesn't work probably because of this issue.
Triton Information What version of Triton are you using? 24.09
Are you using the Triton container or did you build it yourself? Triton container
To Reproduce
- build trtllm engine with
trtllm-build --kv_cache_type=disabled - load the model in triton with
batching_strategy:inflight_fused_batching - run inference with batched data. We can see the error and the batch size is always 1.
Expected behavior triton should work when kv cache is disabled