libcyphal
libcyphal copied to clipboard
Provide a better allocator then the current block allocator.
(Originally from Pavel on https://github.com/UAVCAN/nunavut/issues/98#issuecomment-557679652)
I had a random idea occur to me earlier in the week that needs to be shared. It's based on the concept of small-object allocator proposed by Alexandrescu, his later presentation on Vexing Alligator, and the standard buddy allocator algorithm.
The problem with constant-complexity block allocators is that the size of the allocated memory is fixed. In sufficiently complex scenarios (such as Libuavcan) where the allocation size varies significantly this may lead to suboptimal memory utilization. Ideally, we should be able to allocate arbitrarily sized blocks to reduce the inner fragmentation, while having the worst case outer fragmentation bounded, at constant time.
Imagine that we have a free store which serves allocation requests in the sbrk style (or in the moving garbage collector style) by advancing the pointer to free memory. Being allocated once, the memory is never returned back.
Imagine that we have a set of standard, fixed-size constant-complexity block allocators (like in Libuavcan). Let the smallest one serve blocks of size X. Let the next-larger one serve blocks of size 2X. The maximum multiplier of X is bounded by the size of the available allocation arena. For memory alignment purposes, X should be a sufficiently large power of 2; the lower limit on X is the size of the pointer because free blocks have to be linked-listed.
Here is a sample arrangement following the above model: having the allocation arena of 8 KiB (typical for Libuavcan applications), and the maximum alignment requirement of 128 bits (although modern SIMD instructions usually require more), the blocks would be of size 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192 bytes.
Initially, no memory is allocated, the allocation pointer is at the beginning of the arena. An allocation request of size S is routed to the allocator whose block size B is the smallest while being greater than or equal S. Since there is no memory available yet, the B-sized allocator detects that it cannot serve the request and queries the next-larger block allocator (i.e., 2B-sized) with the objective to get a block from it, split it into two B-sized parts, return one part and add the second part into its free list. The 2B-sized allocator is unable to serve the request, so it is propagated to 4B-sized one and so on until the upper allocator is reached. Then the chain of allocation requests is terminated at the sbrk-style arena allocator which merely advances the pointer by B*2(number of allocators between B and arena).
When the S-sized memory block is freed, it is added immediately to the free list of the B-sized allocator without any further processing. This differs from the standard buddy allocation logic which coalesces free blocks; compaction is not done because it is incompatible with the constant-complexity allocation/deallocation requirement. One practical outcome is that availability of large memory blocks is dependent on the worst-case utilization of smaller memory blocks.
For non-embedded platforms, the top-level sbrk-style arena allocator could be replaced with the standard malloc().
Considering that we are currently promoting the block allocator in libuavcan for use with STL containers, it might make sense to have it extended with this constant-complexity buddy allocation algorithm.
Here: https://github.com/pavel-kirienko/o1heap
While making this, I learned some new things about dynamic memory allocation algorithms and I was surprised to discover that the docs for some popular RTOS describe dynamic memory allocation as fundamentally non-time-deterministic, which is incorrect.