AMDGPU.jl
AMDGPU.jl copied to clipboard
Use refcounting for memory management
The current approach of escaping kernel inputs during kernel execution, and having finalizers directly free HSA memory allocations, is problematic when considering the potential benefits of https://github.com/JuliaLang/julia/pull/44056.
We could instead emulate the behavior of CUDA, and do refcounting of HSA allocations in the finalizer and for the duration of kernel executions. This would make HSA object finalizers very fast (possibly just being a single atomic add), and would stop us from escaping objects to protect allocations. It would also let us localize memory allocation failures to a limited set of tasks, which can let us provide better error handling behavior globally.
Also as requested by @luraess, we should allow unsafe_free to be manually called when the user knows that the allocation is dead, and gracefully handle this.
We now have stream-ordered allocations and on top use the refcounting mechanism from GPUArrays.
As well as unsafe_free!.