[FEA] Support specifying a host mr for `rmm::device_uvector`
Is your feature request related to a problem? Please describe.
Currently when calling rmm::device_uvector::element we always copies to pageable memory before producing a host scalar value. The use of pageable memory can introduce additional performance overhead that users may wish to avoid. This overhead could be avoided if the data could be copied into pinned memory instead of pageable memory during the cuda mempcy operation.
Describe the solution you'd like
device_uvector should allow specifying a host mr on construction in addition to a device mr. The host mr would be used to allocate memory when D2H copies are made by the internals of the device_uvector code.
Is there an issue for the equivalent feature for rmm::device_scalar?
Thanks! I agree this is the optimal solution: that all projects dependent on RMM will benefit collectively. Fixing issues one by one would be overly tedious for those projects.
For reference, you can read this in detail I wrote: https://github.com/rapidsai/cudf/issues/18967
Is there an issue for the equivalent feature for
rmm::device_scalar?
Nope, please feel free to create one.
I’m also happy to take on this issue together with #1959 . We’re actively discussing design aspects of device_scalar there.
But a function like mr::get_current_host_resource_ref() seems necessary to me.