rmm icon indicating copy to clipboard operation
rmm copied to clipboard

[BUG] Unable to allocate the max pool size from pool MR when an initial size is set

Open nirandaperera opened this issue 5 months ago • 1 comments

Describe the bug When an initial pool size is set in the pool memory resource, we can not allocate the max_pool_size amount of memory. I believe It's a consequence of the expanding strategy.

Steps/Code to reproduce bug

TEST(PoolTest, Foo)
{
  EXPECT_NO_THROW([]() {
    pool_mr mr(rmm::mr::get_current_device_resource_ref(), 256, 1024);
    mr.allocate(1024);
  }());  // fails 

  EXPECT_NO_THROW([]() {
    pool_mr mr(rmm::mr::get_current_device_resource_ref(), 0, 1024);
    mr.allocate(1024);
  }());  // passes
}

Expected behavior I would expect both the test cases to pass, as there are no other allocations, and we should be able to allocate the max pool size even if there was an initial allocation.

Environment details (please complete the following information):

  • current rmm main

Additional context Add any other context about the problem here.

nirandaperera avatar Jun 11 '25 22:06 nirandaperera

This is caused by fragmentation because growing the pool creates a new upstream allocation. It would be nice for the pool to use the CUDA virtual memory APIs to make this more powerful, but for now I think you should always set your initial size larger than the largest size you expect to allocate.

Or use async_memory_resource.

harrism avatar Jun 11 '25 23:06 harrism

Thanks for the context @harrism. I am not sure if we have a good action to take here. I am inclined to close this as expected behavior, or turn it into a docs issue that explains the limitations of the pool MR (see also #1694).

bdice avatar Jun 24 '25 21:06 bdice