tensorrtllm_backend
tensorrtllm_backend copied to clipboard
The new kv cache related metrics are missing: allocTotalBlocks, allocNewBlocks, reusedBlocks
TensorRT-LLM has more stats for kv cache, but the backend doesn't have.
Can we add the missing ones to the next week's commits?
struct KvCacheStats
{
SizeType32 maxNumBlocks;
SizeType32 freeNumBlocks;
SizeType32 usedNumBlocks;
SizeType32 toksPerBlock;
SizeType32 allocTotalBlocks;
SizeType32 allocNewBlocks;
SizeType32 reusedBlocks;
};