tvm [Runtime][MemoryPool] Memory pool with limited cache size

[Runtime][MemoryPool] Memory pool with limited cache size

Open shtinsa opened this issue 1 year ago • 2 comments

trafficstars

The commit contains new memory manager which limits amount of cached buffers.

The problem appeared on scenario with 4 sequential transformer based networks which generate output with different sizes on each iteration. In this case default PooledAllocator keeps large amount of buffers, for example it was about 2000 unallocated buffers (~1.9GB of memory) at the pool clearance moment. New allocator limits the pool size by 256 entities.

Feb 03 '24 10:02 shtinsa

i think some changes that @MasterJH5574 introduced recently would alleviate this situation if we bring outputs that have a maximum upper bound

Feb 03 '24 18:02 tqchen

i think some changes that @MasterJH5574 introduced recently would alleviate this situation if we bring outputs that have a maximum upper bound

Frankly speakng this PR supplements 2 cases: the cached buffers large number and the second one is related to CUDA crash after pool de-allocation. The logs show that the deallocation works correctly, the CUDA reports that it has enough of memory to create new buffer, buffer is allocated but inference fails inside of fused_relax_matmul_relax_add_relax_add_cutlass_(DLTensor*, DLTensor*, DLTensor*, DLTensor*, DLTensor*) call.

Feb 05 '24 17:02 shtinsa

tvm tvm copied to clipboard

[Runtime][MemoryPool] Memory pool with limited cache size

tvm
tvm copied to clipboard