axom
axom copied to clipboard
Array: improve support for unified memory
Summary
-
Adds GPU-side
ArrayOps
operations for unified and pinned memory.As with the operations for device memory, unified and pinned memory operations try to use Umpire operations when the type is trivially-copyable, and instantiate a GPU kernel for fill operations when the type is trivially-constructible. This potentially avoids unwanted memory motion between the CPU and GPU.
In the case where the type isn't trivially-copyable or trivially-constructible, the corresponding operation needs to be performed on the device in order to correctly call the required constructor. However, since unified and pinned memory are accessible on the device, we can just do that operation in-place instead of allocating a temporary buffer.
In the summary, can you briefly say if this this change lets us do something we couldn't do before? Or maybe whether it only streamlines, slimplify, whatever..., something we can already do?
In the summary, can you briefly say if this this change lets us do something we couldn't do before? Or maybe whether it only streamlines, slimplify, whatever..., something we can already do?
This change mainly concerns performance for unified memory on Nvidia -- we generally want to avoid touching unified memory on the host when possible to avoid expensive page-faults in a subsequent device kernel.