Add experimental cuda_async_pinned_memory_resource

Open bdice opened this issue 1 month ago • 1 comments

Description

Contributes to #2054.

Adds a new cuda_async_pinned_memory_resource that provides stream-ordered pinned (page-locked) host memory allocation using CUDA 13.0's cudaMemGetDefaultMemPool API with cudaMemAllocationTypePinned.

This parallels the cuda_async_managed_memory_resource added in #2056.

Key Features

Uses the default pinned memory pool for stream-ordered allocation/deallocation
Accessible from both host and device
Requires CUDA 13.0+ (matches managed version for API consistency)

Implementation

C++ Header: cpp/include/rmm/mr/cuda_async_pinned_memory_resource.hpp
Runtime Capability Check: Added runtime_async_pinned_alloc struct to runtime_capabilities.hpp
C++ Tests: cpp/tests/mr/cuda_async_pinned_mr_tests.cpp with tests for allocation, host accessibility, and pool equality
Python Bindings: Added to experimental module with proper type stubs
Python Tests: python/rmm/rmm/tests/test_cuda_async_pinned_memory_resource.py

Follow-up Tasks

Determine whether to provide docs on how to set release threshold or other pool properties
Consider adding more comprehensive benchmarks comparing against synchronous pinned_host_memory_resource

Checklist

[x] I am familiar with the Contributing Guidelines
[x] New or existing tests cover these changes
[x] The documentation is up to date with these changes

Nov 25 '25 23:11 bdice

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Nov 25 '25 23:11 copy-pr-bot[bot]