tvm
tvm copied to clipboard
[Runtime][MemoryPool] Memory pool with limited cache size
The commit contains new memory manager which limits amount of cached buffers.
The problem appeared on scenario with 4 sequential transformer based networks which generate output with different sizes on each iteration. In this case default PooledAllocator keeps large amount of buffers, for example it was about 2000 unallocated buffers (~1.9GB of memory) at the pool clearance moment. New allocator limits the pool size by 256 entities.
i think some changes that @MasterJH5574 introduced recently would alleviate this situation if we bring outputs that have a maximum upper bound
i think some changes that @MasterJH5574 introduced recently would alleviate this situation if we bring outputs that have a maximum upper bound
Frankly speakng this PR supplements 2 cases: the cached buffers large number and the second one is related to CUDA crash after pool de-allocation. The logs show that the deallocation works correctly, the CUDA reports that it has enough of memory to create new buffer, buffer is allocated but inference fails inside of fused_relax_matmul_relax_add_relax_add_cutlass_(DLTensor*, DLTensor*, DLTensor*, DLTensor*, DLTensor*) call.