libcudacxx icon indicating copy to clipboard operation
libcudacxx copied to clipboard

Implement `any_resource_ref`

Open miscco opened this issue 1 year ago • 2 comments

This implements the NVIDIA internal proposal for a any_resource_ref abstraction that facilitates the handling of memory allocations on host and device side.

To facilitate adoption the proposal is backported to C++14 / C++17 through a concept emulation designed by @ericniebler. Unfortunately, C++11 is too far a stretch due to missing support for constexpr variables.

Currently only plain any_resource_ref resource is implemented, as async_resource_ref is more or less a straightforward extension.

Things still to do:

  • [x] Implement async_resource_ref
  • [x] Implement equality_comparable concept
  • [ ] Write some documentation

miscco avatar Aug 30 '22 12:08 miscco

fyi @miscco we'll likely want to steal some things from https://github.com/NVIDIA/libcudacxx/pull/158, like stream_view (though probably rename to stream_ref) and maybe some of the exception stuff. It also has quite a few tests that could be helpful for showing what kind of tests we want here.

jrhemstad avatar Aug 30 '22 22:08 jrhemstad

NOTE to reviewers:

I have pushed the concept emulation into its own branch and targeted this PR to that branch. Consequentlly, we can review only the memory_resource part of the PR. Before merging we would need to change the target branch back to main

miscco avatar Sep 08 '22 09:09 miscco

@griwes I pushed a small change to the interface of {async_}resource_ref after your review.

We now also accept allocate(size) and allocate_async(size, stream_ref) and default to alignof(max_align_t) for the alignment in that case

miscco avatar Sep 30 '22 08:09 miscco

default to alignof(max_align_t) for the alignment in that case

I know we have discussed this in the past, but the default alignment warrants one last look.

cudaMalloc's default guaranteed alignment is 256B. Whereas alignof(max_align_t) == 16.

alignof(max_align_t) would be more consistent with std::pmr::memory_resource whereas 256B would be more consistent with CUDA.

People often rely on the over alignment of cudaMalloc allocations for a variety of reasons, mostly performance.

Granted, anyone can always explicitly request a 256B alignment, but people may find it annoying to have to be explicit about it every time when using cudaMalloc/cudaMallocManaged/etc give it to you implicitly.

jrhemstad avatar Sep 30 '22 14:09 jrhemstad

Defaulting to 256 would be disastrous for both CPU side resources (the malloc one would overallocate by 255 bytes every time!) and for pool resources for the GPU (because now you can't subdivide a single block of GPU memory into chunks smaller than 256). 16 is the only sane default if you ask me.

Whatever default we choose, it's going to be annoyingly required to spell something else out for some use cases. What we could do to reduce this a tiny little bit, at least for resource_ref, is put it in a template parameter of the type and figure out aliases with/without it defaulted, but IDK if that's really worth it.

griwes avatar Sep 30 '22 16:09 griwes

yeah every default we choose will be the wrong one.

That said, from my perspective there are 2 pretty fine solutions for users:

  1. Provide the desired alignment.
  2. Ignore the passed alignment in their user provided allocation function. If they want 256 anyhow, they can do so just fine

miscco avatar Oct 04 '22 07:10 miscco

@wmaxey Is this the right way to test host only functionality or do I need to put them into the heterogeneous subfolder?

miscco avatar Oct 04 '22 07:10 miscco