[FEA] Ability to control the memory resource for temporary allocations
Summary
I would like to be able to control the memory resource used for temporary allocations in libcudf.
Proposal
The current behavior of libcudf is that all APIs accept an rmm::mr::device_memory_resource_ref. This is used for output allocations returned to the user.
We would like to add a way to control temporary allocations and not just output allocations. However, we don't want to add a parameter or break APIs.
I propose that we create an object like cudf::memory_resources, and accept cudf::memory_resources resources in place of rmm::device_async_resource_ref mr in our APIs. The constructor below would retain the existing behavior if an rmm::device_async_resource_ref was passed in, meaning that this should only break ABI and not API (and we are okay with ABI breakage).
#include <cudf/utilities/memory_resource.hpp>
#include <rmm/mr/device/device_memory_resource.hpp>
namespace cudf {
class memory_resources {
public:
// Constructor: sets output_mr; temporary_mr defaults to current device MR
memory_resources(rmm::device_async_resource_ref output_mr_)
: temporary_mr{cudf::get_current_device_resource_ref()},
output_mr{output_mr_}
{
}
// Constructor: sets both output_mr and temporary_mr explicitly
memory_resources(rmm::device_async_resource_ref output_mr_,
rmm::device_async_resource_ref temporary_mr_)
: temporary_mr{temporary_mr_}, output_mr{output_mr_}
{
}
[[nodiscard]] rmm::device_async_resource_ref get_temporary_mr() const noexcept
{
return temporary_mr;
}
[[nodiscard]] rmm::device_async_resource_ref get_output_mr() const noexcept
{
return output_mr;
}
private:
device_memory_resource_ref temporary_mr;
device_memory_resource_ref output_mr;
};
} // namespace cudf
Then, we would refactor libcudf to replace internal uses of cudf::get_current_device_resource_ref() with resources.get_temporary_mr().
We would need to validate that we are using the passed-in temporary memory resource and not using the device default MR. To do that, we could enable an environment variable in our tests like LIBCUDF_ERROR_ON_CURRENT_DEVICE_RESOURCE_REF that causes cudf::get_current_device_resource_ref() to fail when invoked (with a corresponding change in behavior for the default temporary MR to avoid calling that public API).
Additional context
APIs that don't return device memory
Some APIs do not accept a memory resource because they do not return device memory. However, they may use temporary allocations that should be controlled in the same way, requiring the addition of an mr parameter to some APIs. That can be defaulted in the same way as other APIs, and won't require reordering because mr always follows stream in libcudf function signatures.
Temporary MR usage in Thrust APIs
We'll need to replace all uses of rmm::exec_policy(stream) with rmm::exec_policy(stream, temporary_mr). For validation, we'll also need to set RMM's default MR to something that throws an error when allocating, because we need to avoid rmm::mr::get_current_device_resource_ref() as well as cudf::get_current_device_resource_ref().
https://github.com/rapidsai/rmm/blob/c69e2b6e03a36c36fc852f319c3311673a0547bc/cpp/include/rmm/exec_policy.hpp#L48-L49
Implicit conversion
We will also want to handle the case where the user passes a resource directly. Normally this would implicitly convert to a ref, but if we are relying on implicit conversion of a ref to the memory_resources class, this now involves two steps of implicit conversion which will not happen automatically. To avoid API breaks, we may want to add constructors from resources (or really, anything that implicitly converts to a device_async_resource_ref).
I wonder if we want a builder API, so we don't need a constructor for every combination of resources, or default params for new members, as we add resources to the class.