parsec
parsec copied to clipboard
Allocation in the correct memory space
Original report by George Bosilca (Bitbucket: bosilca, GitHub: bosilca).
Currently the mechanism we use to allocate memory for internal usages, is tied to the NUMA node of the thread realizing the allocation. This remains true even in the case where we know that the memory is targeted toward a particular VP (such as the task structure for a future task that should be executed on a particular core or set of cores).
We need to implement specialized allocators that provide memory from a particular NUMA node upon request. There is a [non-portable] mechanism on Mac OS X the malloc_zone, allowing to have malloc working on a prepared zone of memory instead of doing srbk to get it from the kernel. This way we can handle memory areas pinned to a particular memory node, and use a replacement for malloc to allocate new tasks and arenas from the NUMA node we want.
I found some software packages that can help in this area. After a quick assessment I retained two for future investigation. One of them is Hoard, and the other TCMalloc. There is also something called rHeap, which also sound quite interesting but I did not delve enough to grasp the entire picture.