rmm
rmm copied to clipboard
Add experimental cuda_async_pinned_memory_resource
Description
Contributes to #2054.
Adds a new cuda_async_pinned_memory_resource that provides stream-ordered pinned (page-locked) host memory allocation using CUDA 13.0's cudaMemGetDefaultMemPool API with cudaMemAllocationTypePinned.
This parallels the cuda_async_managed_memory_resource added in #2056.
Key Features
- Uses the default pinned memory pool for stream-ordered allocation/deallocation
- Accessible from both host and device
- Requires CUDA 13.0+ (matches managed version for API consistency)
Implementation
-
C++ Header:
cpp/include/rmm/mr/cuda_async_pinned_memory_resource.hpp -
Runtime Capability Check: Added
runtime_async_pinned_allocstruct toruntime_capabilities.hpp -
C++ Tests:
cpp/tests/mr/cuda_async_pinned_mr_tests.cppwith tests for allocation, host accessibility, and pool equality - Python Bindings: Added to experimental module with proper type stubs
-
Python Tests:
python/rmm/rmm/tests/test_cuda_async_pinned_memory_resource.py
Follow-up Tasks
- Determine whether to provide docs on how to set release threshold or other pool properties
- Consider adding more comprehensive benchmarks comparing against synchronous
pinned_host_memory_resource
Checklist
- [x] I am familiar with the Contributing Guidelines
- [x] New or existing tests cover these changes
- [x] The documentation is up to date with these changes
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.
Contributors can view more details about this message here.