roc
roc copied to clipboard
DRAFT: Exploratorily adding checks to roc_alloc and roc_realloc
This is a draft PR to see how a platform's host code might look/feel when we add stricter checks to the memory allocation function (roc_alloc and roc_realloc).
https://godbolt.org/z/5q5z6rMrE
This is how far I'm able to push optimizing this code on various common compilers by the way.
This has a happy path of 7 x86-64 assembly instructions on GCC and Clang (and 11 on MSVC).
That is 6 (resp 10) more than the YOLO implementation of simply calling malloc without any checks.
To my mind, this overhead is neglegible; especially since malloc itself will at best run much more instructions in userland, and at worst do a syscall.
(To be compiler-compliant, the Hedley single-header library is used)
Yeah, this should be basically no cost for 2 main reasons:
roc_allocis already far away in the executable and is not likely to be in icache with the currently executing functions. So adding a few instructions should not really effect icache performance when callingroc_alloc.- As long as branches are ordered correctly to generate
malloc; jmp to bad case if null; return on good case, the jump for the bad case should be 100% predictable until we reach the point that the app is crashing anyway.
The only pathological case it hurts is the case that we are calling roc_alloc a ton. But if you are calling roc_alloc a ton, you are already basically guaranteed to have bad performance. I think this is a good addition for most platforms. I think the one exceptions would be stack or bump allocated platforms. They might have a fast enough allocation and use case that it is worth avoid that minor cost.
As an aside, we should probably update the compiler to just not call roc_alloc when dealing with zero sized types.
As an aside, we should probably update the compiler to just not call
roc_allocwhen dealing with zero sized types.
This is actually useful in the very specific use case of wanting automatically host-managed resources like file descriptors that clean themselves up.
Basically it can make a Box of an empty tag union on open to represent a file descriptor's lifetime, and store the allocated address (which is just big enough for the refcount with no additional bytes) in a hashmap to be able to look up the fd later.
Then when it automatically gets deallocated, roc_dealloc can do a hash lookup for that address to see if it needs to additionally close a fd. To avoid paying that hash lookup for every single deallocation, it can do a quick check to see if the number of bytes it's deallocating is size_of::<usize>() (so, just the refcount with 0 additional bytes on the heap), and only bother doing the hash lookup if so!
Oh yeah, I forgot about that use case. I guess that means that we would never request for an allocation of size zero? Roc always asks for at least a refcount? Is that correct? If so, we can just remove the zero size check all together.
Oh yeah, I believe that's true!