rmm [FEA] RMM should not pad allocations

trafficstars

Is your feature request related to a problem? Please describe. Currently, RMM pads every allocation that goes through the rmm::device_memory_resource interface to a multiple of 8 bytes in size.

https://github.com/rapidsai/rmm/blob/8527317aa63efe37c9815b1419c893e666cbea56/include/rmm/mr/device/device_memory_resource.hpp#L102-L105

This was originally added in #165, which doesn't provide explanation of why. I believe it was to allow accessing structures via aliases in cuIO in a way that would otherwise be UB.

Describe the solution you'd like We should not pad allocations unnecessarily. RMM allocation should not have surprising behavior like this.

Sep 08 '21 00:09 harrism

On a related note, I just noticed that Arrow requires 8B padded allocations for all allocations and recommends 64B.

Of course, the base RMM interface shouldn't be enforcing this and it should be cuDF's responsibility to enforce this. A clean way to do it would be an adaptor that just rounds up allocation requests to a specified amount (e.g., 64B).

Oct 06 '21 17:10 jrhemstad

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Nov 18 '21 18:11 github-actions[bot]

We discussed this in a meeting today. The plan is to go ahead and remove rmm's padding altogether for now. The consensus is that 8-byte padding is not actually required by the Arrow spec anywhere except in the case of Arrow IPC, and in that situation the requirement applies to the entire IPC message rather than individual buffers. This means that cudf does not require 8-byte padded allocations. The consensus is that none of rmm's other consumers require padding either. If such a need arises, the plan is to implement a simple padding adapter in rmm that can be used by consumers to wrap any memory resources they need to produce padded allocations, but such an implementation is not currently planned since we have not found a need for it.

Some more detailed context is below since there has been some confusion in previous discussions about exactly what changes were being proposed here w.r.t padding and alignment and the implications for Arrow-compliant allocations.

Detailed Background

There are two distinct concepts that should be considered, alignment and padding. Alignment refers to the starting point of a memory allocation. For example, all (4-byte) ints must start on a memory address divisible by 4. Padding refers to extra bytes added to an allocation, usually to ensure that it ends at a certain alignment as required by the allocation of the current object or to allow the use of vectorized instructions that expect to be able to safely read contiguous chunks of allocated memory. For example, a struct struct A { char c; int x; } will generally contain three padding bytes after c in order to ensure that the int x starts on a 4-byte aligned address.

Alignment

CUDA memory allocations are guaranteed to be 256-byte aligned:

Any address of a variable residing in global memory or returned by one of the memory allocation routines from the driver or runtime API is always aligned to at least 256 bytes.

Any rmm allocation of device memory also makes the same promise in order to match CUDA behavior. There is no plan to change this behavior. Changing this is also not part of the above discussion on this issue.

Padding

rmm devs have previously operated under the understanding that the Arrow spec (which is crucial to one of rmm's primary consumers, cudf) requires that memory allocations be padded to a length that is a multiple of 8 bytes. As a result, padding has been baked into many of rmm's memory allocators using the align_up function (despite the name, this function is used to compute paddings). However, a careful reading of the Arrow documentation indicates that the padding is more of a suggestion for performance than a requirement for correctness. The primary stated benefit is to take advantage of SIMD instructions on CPUs. Since we are not concerned about this, we do not think retaining padding is important.

IPC

One additional consideration is whether padded allocations are necessary for Arrow IPC. However, reading this part of the Arrow spec makes clear that the only requirement is on 8-byte padding of an IPC message as a whole, not of individual allocations like columns. Therefore, we again do not need to be worried about individual allocations being 8-byte padded.

Apr 20 '23 22:04 vyasr

Thanks for the excellent summary, @vyasr ! One missing detail is the real reason 8-byte padding was added to all cuDF allocations in the early days. As I mentioned in the issue description,

I believe it was to allow accessing structures via aliases in cuIO in a way that would otherwise be UB.

I am hopeful that by now such risky code has been removed from cuIO. Otherwise removing the padding will increase the risk of UB effects.

Apr 21 '23 02:04 harrism

rmm rmm copied to clipboard

[FEA] RMM should not pad allocations

Detailed Background

Alignment

Padding

IPC

rmm
rmm copied to clipboard