unified-memory-framework icon indicating copy to clipboard operation
unified-memory-framework copied to clipboard

proxylib Free Is Noticeably Slower Than Direct UMF Pool Under Multithreaded Workloads

Open lplewa opened this issue 11 months ago • 3 comments

In proxylib, every free operation must check whether the pointer being freed belongs to the "leak pool." The leak pool is a workaround for recursive allocations when the malloc function (overridden by proxylib) triggers other call to malloc (often through libraries like hwloc).

This check is performed under a lock, causing threads to synchronize on every free. This results in significant overhead under multithreaded loads. Although #1072 increases the size of the pool to reduce the time spent under this lock, the goal should be to remove the lock entirely.

Two approaches are under consideration:

-Use Atomic Operations Instead of a Mutex The leak pool consists of multiple smaller pools linked together. When they are all full, a new pool is created. Instead of relying on a lock, we can manage this pool list with atomic compare-and-swap operations.

-Use a Single Large Pool Rather than maintaining multiple pools, create a large anonymous mmap (with PROT_NONE). If more space is needed, simply change the protection flags for new pages. This removes the need for locking to verify whether a pointer belongs to the pool. On Windows, VirtualAlloc can be used similarly to reserve and commit pages on-demand.

Below is a flame graph illustrating performance after #1072 Image

lplewa avatar Feb 04 '25 15:02 lplewa

BTW, I believe the tbbmalloc_proxy already solves the same issue. We should look at how they are dealing with the issue.

vinser52 avatar Feb 04 '25 16:02 vinser52

This is exactly why critnib was made to have wait-free reads.

pbalcer avatar Feb 04 '25 16:02 pbalcer

I also think that it is solved problem, and it should be easy to fix. We just must select best option, as there is multiple fix options to chose.

lplewa avatar Feb 04 '25 16:02 lplewa