TensorRT-LLM
TensorRT-LLM copied to clipboard
how to calculate the Number of blocks in C++ runtime
model:Qwen1.5-7B-sft Engine version 0.13.0.dev2024082000
log: [TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 39.39 GiB, available: 21.89 GiB [TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 631
code: auto const [blocksInPrimaryPool, blocksInSecondaryPool] = bmkv::KVCacheManager::calculateMaxNumBlocks( kvCacheConfig, kvDtype, mModelConfig, mWorldConfig, getBufferManager());
why about 21G memory only have 631 blocks? how to calculate the Number of blocks in C++ runtime?