tensorrtllm_backend tensortllm backend fails when kv cache is disabled

tensortllm backend fails when kv cache is disabled

Open ShuaiShao93 opened this issue 11 months ago • 5 comments

Description Error

model_instance_state.cc:1117] "Failed updating TRT LLM statistics: Internal - Failed to find Max KV cache blocks in metrics."

when kv cache is disabled when building the trtllm engine. And inflight_fused_batching batcher doesn't work probably because of this issue.

Triton Information What version of Triton are you using? 24.09

Are you using the Triton container or did you build it yourself? Triton container

To Reproduce

build trtllm engine with trtllm-build --kv_cache_type=disabled
load the model in triton with batching_strategy:inflight_fused_batching
run inference with batched data. We can see the error and the batch size is always 1.

Expected behavior triton should work when kv cache is disabled

Nov 13 '24 20:11 ShuaiShao93