AMDGPU.jl icon indicating copy to clipboard operation
AMDGPU.jl copied to clipboard

Use refcounting for memory management

Open jpsamaroo opened this issue 3 years ago • 1 comments

The current approach of escaping kernel inputs during kernel execution, and having finalizers directly free HSA memory allocations, is problematic when considering the potential benefits of https://github.com/JuliaLang/julia/pull/44056.

We could instead emulate the behavior of CUDA, and do refcounting of HSA allocations in the finalizer and for the duration of kernel executions. This would make HSA object finalizers very fast (possibly just being a single atomic add), and would stop us from escaping objects to protect allocations. It would also let us localize memory allocation failures to a limited set of tasks, which can let us provide better error handling behavior globally.

jpsamaroo avatar Mar 09 '22 14:03 jpsamaroo

Also as requested by @luraess, we should allow unsafe_free to be manually called when the user knows that the allocation is dead, and gracefully handle this.

jpsamaroo avatar Mar 14 '22 14:03 jpsamaroo

We now have stream-ordered allocations and on top use the refcounting mechanism from GPUArrays. As well as unsafe_free!.

pxl-th avatar Sep 09 '23 09:09 pxl-th