Array: improve support for unified memory

Open publixsubfan opened this issue 1 year ago • 2 comments

Summary

Adds GPU-side ArrayOps operations for unified and pinned memory.

As with the operations for device memory, unified and pinned memory operations try to use Umpire operations when the type is trivially-copyable, and instantiate a GPU kernel for fill operations when the type is trivially-constructible. This potentially avoids unwanted memory motion between the CPU and GPU.

In the case where the type isn't trivially-copyable or trivially-constructible, the corresponding operation needs to be performed on the device in order to correctly call the required constructor. However, since unified and pinned memory are accessible on the device, we can just do that operation in-place instead of allocating a temporary buffer.

Feb 26 '24 22:02 publixsubfan

In the summary, can you briefly say if this this change lets us do something we couldn't do before? Or maybe whether it only streamlines, slimplify, whatever..., something we can already do?

Mar 26 '24 21:03 gunney1

In the summary, can you briefly say if this this change lets us do something we couldn't do before? Or maybe whether it only streamlines, slimplify, whatever..., something we can already do?

This change mainly concerns performance for unified memory on Nvidia -- we generally want to avoid touching unified memory on the host when possible to avoid expensive page-faults in a subsequent device kernel.

Apr 12 '24 05:04 publixsubfan

axom axom copied to clipboard

Array: improve support for unified memory

Summary

axom
axom copied to clipboard