lol_alloc icon indicating copy to clipboard operation
lol_alloc copied to clipboard

Faster freeing allocator

Open Craig-Macomber opened this issue 1 year ago • 0 comments

Currently lol_alloc does not contain an allocator which actually does freeing with better than O(n) allocation and O(n) freeing where n is the length of the free-list.

For now users with these requirements should use Rust's built in allocator however, there is room for a somewhat optimized allocator in between AssumeSingleThreaded<FreeListAllocator> as 654 bytes and the builtin rust allocator at 5034 bytes.

This issue tracks the desire for and possible work to implement such an allocator. If you have an interest in such an addition please respond here (ex: comment below or react to this post).

If lol_alloc were to provide a faster allocator it would require the following work:

  1. Improved testing approach. Currently this repo doesn't have a good way to fuzz test across all the implementations and without that maintaining so many allocators may become problematic.
  2. Performance testing: without comparison between the simple free list allocator, the optimized one and the rust built in one, its not worth maintaining an optimized version.
  3. The actual implementation of the optimized allocator.

For the implementation there are a lot of different designs which could be taken to address this:

  1. buddy_memory_allocation is a nice approach, however it looks like it is cleanest when it owns the whole heap. This could work for lol_alloc by banning other users of memory_grow and lazily allocating.
  2. Optimized version of FreeListAllocator using a Skip list. If each link stores the largest size between the current node (inclusive) and the linked not (exclusive) (similar to an ndexable_skiplist) this could allow O(log(n) allocation and freeing. It would increase the minimum supported allocation size from 2 words to 2*layers words (where layers is the number layers in the skip list). Picking a number of layers and sparsity factor statically seems like the simplest option (with FreeListAllocator being the case for 1 layer). A variety of configurations (possibly via const generics) could be provided. 4 layers and a sparsity factor per layer of 1/64 should handle a worst case fragmented 4 GB heap (max for 32 bit WASM) pretty well.
  3. etc. There are many possible there designs.

Currently the skip list based approach seems like it best fits into this existing code-base. It requires a source of randomness with will add some code size, but I suspect a small PRNG like a Lehmer_random_number_generator should be fine and only add a few bytes of code size.

I'm not currently planning to implement this, but I haven't worked with skip lists before and it seems like a good learning opportunity, so I might do it at some point.

If you are interested in working on some part of this, please discuss it in comments on this issue before starting the implementation: I do not want to spend a lot of time maintaining this repo, and so that means a high quality and maintainability bar, as well as a low amont of willingness to review PRs, so any contributions will have to be carefully aligned with my vision before I'd want to review them.

Craig-Macomber avatar May 28 '23 21:05 Craig-Macomber