rmm [FEA] Support stream semantics in rmm::mr::pool_memory_resource with rmm::mr::pinned_memory

Is your feature request related to a problem? Please describe.

{
  // mr here is rmm::mr::pool_memory_resource<rmm::mr::pinned_memory_resource>
  rmm::device_uvector<int> tmps(1024, stream, mr);
  thrust::for_each(rmm::exec_policy_nosync(stream), tmps.beign(), tmps.end(),
     []__device__(auto val) { ... });
}

Here, users may expect tmps to follow stream semantics but it gets immediately deallocated once the variable becomes out-of-scope (before the thrust::for_each call actually finishes).

This behavior can be surprising to some users and the fix requires an explicit stream synchronization (and this may have unnecessary performance overhead).

Describe the solution you'd like rmm::mr::pool_memory_resource to follow stream semantics even when the upstream memory allocator is (host) pinned_memory_resource.

Describe alternatives you've considered Explicitly calling cudaStreamSynchronize();

Sep 25 '25 18:09 seunghwak

rmm will be moving to use cudaMallocFromPoolAsync for different memory types, which would resolve this problem. I'm not sure if we have support for everything that we need to do this in CUDA 12, but we certainly do in CUDA 13.

Sep 26 '25 17:09 vyasr

https://github.com/rapidsai/rmm/issues/2054#issuecomment-3339881396 describes a possible solution, it seems like we might have what we need already to do this in CUDA 12.

UPDATE: Not a viable solution for CUDA 12 in general.

Sep 26 '25 18:09 vyasr

[FEA] Support stream semantics in rmm::mr::pool_memory_resource with rmm::mr::pinned_memory_resource as an upstream