rmm icon indicating copy to clipboard operation
rmm copied to clipboard

Add experimental cuda_async_pinned_memory_resource

Open bdice opened this issue 1 month ago • 1 comments

Description

Contributes to #2054.

Adds a new cuda_async_pinned_memory_resource that provides stream-ordered pinned (page-locked) host memory allocation using CUDA 13.0's cudaMemGetDefaultMemPool API with cudaMemAllocationTypePinned.

This parallels the cuda_async_managed_memory_resource added in #2056.

Key Features

  • Uses the default pinned memory pool for stream-ordered allocation/deallocation
  • Accessible from both host and device
  • Requires CUDA 13.0+ (matches managed version for API consistency)

Implementation

  • C++ Header: cpp/include/rmm/mr/cuda_async_pinned_memory_resource.hpp
  • Runtime Capability Check: Added runtime_async_pinned_alloc struct to runtime_capabilities.hpp
  • C++ Tests: cpp/tests/mr/cuda_async_pinned_mr_tests.cpp with tests for allocation, host accessibility, and pool equality
  • Python Bindings: Added to experimental module with proper type stubs
  • Python Tests: python/rmm/rmm/tests/test_cuda_async_pinned_memory_resource.py

Follow-up Tasks

  • Determine whether to provide docs on how to set release threshold or other pool properties
  • Consider adding more comprehensive benchmarks comparing against synchronous pinned_host_memory_resource

Checklist

  • [x] I am familiar with the Contributing Guidelines
  • [x] New or existing tests cover these changes
  • [x] The documentation is up to date with these changes

bdice avatar Nov 25 '25 23:11 bdice

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Nov 25 '25 23:11 copy-pr-bot[bot]