parsec icon indicating copy to clipboard operation
parsec copied to clipboard

GPU: lazy memory allocation

Open therault opened this issue 1 year ago • 7 comments

PR #613 made all CI tests initialize the GPU if there is a GPU available. When running in oversubscribe mode, this can lead to falsely failing tests, that fail not because of a software issue, but because of a deployment issue (multiple processes trying to allocate 90% of the GPU memory at the same time).

In general, since we don't know if the GPU will be used or not, we should not preemptively allocate all the memory on it. This PR makes memory allocation lazy: it is delayed until we do try to use some GPU memory.

The drawback is that the first GPU task will also pay the cost of a large cuda_malloc / zmalloc etc...

therault avatar Jan 22 '24 20:01 therault