rmm icon indicating copy to clipboard operation
rmm copied to clipboard

[FEA] Add "fallback" memory resource

Open GregoryKimball opened this issue 3 months ago • 2 comments

Is your feature request related to a problem? Please describe. We would like to test a "fallback" memory resource, where allocations are attempted on the first MR, and if the allocation is not possible, then allocations are attempted on a second MR.

Describe the solution you'd like In 25.10, cudf-polars uses this memory resource by default:

new_mr = rmm.mr.PrefetchResourceAdaptor(
                rmm.mr.PoolMemoryResource(
                    rmm.mr.ManagedMemoryResource(),
                    initial_pool_size=free_memory,
                )
            )

and we would like to experiment with an MR like this:

async_mr = rmm.mr.CudaAsyncMemoryResource()
managed_mr = rmm.mr.PrefetchResourceAdaptor(
                rmm.mr.PoolMemoryResource(
                    rmm.mr.ManagedMemoryResource(),
                    initial_pool_size=free_memory,
                )
            )
new_mr = rmm.mr.FallbackResourceAdaptor(
    async_mr, managed_mr)

Describe alternatives you've considered I don't know if there is a way to compose something like this today. CallbackResourceAdaptor might be able to do a try-catch. Maybe there is a way to mock this up without changing the default MR back and forth.

Additional context We would like to test this idea in cudf-polars and Velox-cuDF to only start using managed memory when it is needed. We might need to set a maximum pool size on the async first MR to give the second managed MR enough room to still prefetch.

GregoryKimball avatar Oct 03 '25 19:10 GregoryKimball

We have this in rapidsmpf: https://github.com/rapidsai/rapidsmpf/blob/109584501d51c3a3d86b9e80220b486aec677a72/cpp/src/rmm_resource_adaptor.cpp#L139

wence- avatar Oct 03 '25 20:10 wence-

Progress began on this in #1665 but that PR stalled. It could be revived.

bdice avatar Oct 10 '25 20:10 bdice