ROC_SHMEM icon indicating copy to clipboard operation
ROC_SHMEM copied to clipboard

Substitute pow2bin allocator with a dlmalloc based allocator

Open abouteiller opened this issue 8 months ago • 2 comments

  • imports dlmalloc (latest version MIT licensed)
  • create an encapsulation class DLMalloc that exposes only relevant functionalities, to prevent using non-static/templated members of parent class we use the mspace variant of dlmalloc
  • use DLMalloc in a new ShmemAllocatorStrategy
  • replace pow2bin in single_heap

Possible drawbacks/missing features:

  • the MORECORE functionality is not implemented (would cause dependency between non-static class members and static functions) -> we cannot resize the symmetric heap (initial allocation only)
  • MSPACE allocator stores some metadata in the symmetric heap, that means that metadata in device memory is manipulated from the host when doing the allocation.
  • I have set the alignment to 2MB, this may be an overkill in some cases, maybe using the memalign functionality of dlmalloc would be a better approach.
  • Performance is untested
  • Unit testing?

abouteiller avatar Apr 01 '25 17:04 abouteiller

@abouteiller to follow up on the comment on the JIRA ticket, is there a test (or at least visual confirmation by reading the code) that dlmalloc is able to combine two freed memory allocation stemming from separate rocshmem_alloc/free() operations if they end-up being consecutive in the memory? This was ultimately why we wanted to replace the previous allocator

edgargabriel avatar Apr 02 '25 16:04 edgargabriel

@abouteiller to follow up on the comment on the JIRA ticket, is there a test (or at least visual confirmation by reading the code) that dlmalloc is able to combine two freed memory allocation stemming from separate rocshmem_alloc/free() operations if they end-up being consecutive in the memory? This was ultimately why we wanted to replace the previous allocator

Yes this is visible in mspace_free:5706 and on, freed chunks are consolidated with preceding, suceeding free space. It also look for a best-fit chunk in the gaps for allocating chunks (with 2 different strategies for small and large requests).

abouteiller avatar Apr 02 '25 17:04 abouteiller

Passes all tests now, ready for review

abouteiller avatar Apr 30 '25 01:04 abouteiller