distributed-join icon indicating copy to clipboard operation
distributed-join copied to clipboard

Investigate device memory usage outside memory pool

Open gaohao95 opened this issue 3 years ago • 1 comments

By default, the memory pool size used is the total GPU memory - 500MB. During some OOM runs, we observed using smaller memory pool solves the OOM issue. This indicates that the program uses a lot of device memory outside of memory pool. Tracking what memory is used outside of memory pool and make sure they are allocated within the memory pool should fix such issues.

gaohao95 avatar Dec 14 '20 22:12 gaohao95