TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

how to calculate the Number of blocks in C++ runtime

Open w066650 opened this issue 1 year ago • 0 comments

model:Qwen1.5-7B-sft Engine version 0.13.0.dev2024082000

log: [TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 39.39 GiB, available: 21.89 GiB [TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 631

code: auto const [blocksInPrimaryPool, blocksInSecondaryPool] = bmkv::KVCacheManager::calculateMaxNumBlocks( kvCacheConfig, kvDtype, mModelConfig, mWorldConfig, getBufferManager());

why about 21G memory only have 631 blocks? how to calculate the Number of blocks in C++ runtime?

w066650 avatar Sep 29 '24 08:09 w066650