cute_alloc.h alignment
The stack and frame allocators return unaligned memory. Even though the header says it doesn't have any special alignment support, it still should have some basic alignment support. Currently using the header as-is will crash on certain platforms due to unaligned memory. In C++ code this is undefined behavior. Writing wrappers around the stack allocator to handle basic alignment breaks freeing.
Possible fix would be to align each allocation to at least std::max_align_t, mirroring the behavior of malloc.
Simple program to show unaligned memory:
// compiled with g++ alloc_test.cpp -m64 -std=c++17 -o alloc_test.out
#include <cstdlib>
#include <cinttypes>
#include <cstdio>
#define CUTE_ALLOC_IMPLEMENTATION
#include "cute_alloc.h"
int main(int argc, char const* argv[]) {
size_t size = 1024;
void* stack_chunk = malloc(size);
void* frame_chunk = malloc(size);
ca_stack_t* stack = ca_stack_create(stack_chunk, size);
ca_frame_t* frame = ca_frame_create(frame_chunk, size);
{
// push stack ptr by sizeof(char) to mess up alignment for next allocation
char* c = (char*)ca_stack_alloc(stack, sizeof(char));
int* i = (int*)ca_stack_alloc(stack, sizeof(int));
bool aligned = (((uintptr_t)i) % alignof(int)) == 0;
printf("Memory from stack is %s\n", aligned ? "aligned" : "not aligned");
}
{
// push frame ptr by sizeof(char) to mess up alignment for next allocation
char* c = (char*)ca_frame_alloc(frame, sizeof(char));
int* i = (int*)ca_frame_alloc(frame, sizeof(int));
bool aligned = (((uintptr_t)i) % alignof(int)) == 0;
printf("Memory from frame is %s\n", aligned ? "aligned" : "not aligned");
}
return 0;
}
Output from program: Memory from stack is not aligned Memory from frame is not aligned
What kind of platform does this crash on, and why? Is it because pointer access is not memory aligned? Should each pointer returned from allocators be aligned?
This is something I was thinking about recently and would be happy to fix in the near future. Thanks for posting!
I have found two good sources of information regarding unaligned memory access: A description of unaligned memory Very good talk about unaligned memory with actual benchmarks (alignment starts at around 24:50 mins)
According to the talk, on most platforms unaligned memory access is way slower, because the cpu has to do two aligned reads and shifts for each unaligned access, except on i7 where it may be slightly faster in very limited occasions. On arm it crashes (mobile devices). But it can crash on any architecture depending on optimization levels. For instance if you allow your compiler to emit auto vectorized code, unaligned memory may crash on SIMD instructions. See this bug explanation detailing this scenario. Basically for SSE2 the compiler will make sure that your data will be aligned to 16 bytes before running aligned SIMD instructions, but will assume that the data is already aligned to its own natural alignment. Basically unaligned memory is either at best a performance bug or an actual crash.
For the question of whether each pointer from an allocator should be aligned: In my opinion it depends on whether the allocation interface allows for custom alignments or not. This is something that standard C/C++ only now started to do, we got std::aligned_alloc and aligned operator new in C++17.
In certain situations it might make sense to allocate unaligned memory (for instance for memory byte streams, although since the stack allocator in cute_alloc is actually a tagged allocator, it might not be suitable to use for memory byte streams). In some cases you might want overaligned memory, for instance for SIMD. If you choose to extend the allocation interface to also have a *_aligned_alloc, you can implement the regular alloc calls by simple doing
return *_aligned_alloc(allocator, size, sizeof(std::max_align_t)); // or alignof
Also as a side note, to get the alignment padding, you can use bitwise and instead of a mod, since alignments are always powers of two. To get the alignment padding from a ptr you can use this snippet:
// for a size_t alignment (unsigned)
size_t padding = (alignment - ((uintptr_t)ptr)) & (alignment - 1); // alignment is power of two
char* aligned = (char*)ptr + padding;
One thing I was wondering is if in general it is preferable to have compile time alignment settings, or run-time alignment settings. I was thinking run-time since a single application can want to have different allocator instances of different alignments for different purposes. Thoughts?
I think that depends on how simple you want the allocator interface to be. malloc/free is as simple as it gets, which basically is compile time alignment although not adjustable. But you can have both (see memalign or std::aligned_alloc).
Having multiple allocator instances for different purposes usually happens in the form of pool allocators that dish out memory regions of fixed size (so alignment is also fixed in those). I haven't really encountered an allocator where just the alignment is fixed.
I don't think it makes sense to have general allocators bound to specific alignments though, for me it seems to conflate the concepts of arrays and allocators somehow. Just my opinion though, would have to try it out and see how useful it is or isn't. I think the two most sensible options in general are either adjustable compile time alignment (that is based off of std::max_align_t by default) and/or dynamic per allocation alignment in form of *_aligned_alloc.
https://github.com/RandyGaul/cute_headers/issues/331