Bound allocations for constrained task_arena
subj.
According to my understanding of hwloc_set_membind() semantics, it affects on which node memory page is allocated during 1st touch to that page. So, if a thread is bound to a NUMA node already, default "allocate from local node on 1st touch" is already there. If we want to say "this thread must always get memory from a particular node, no matter where the thread is executed", this is the right tool. But for the beginning it's better to understand which kind of applications this feature can give a performance gain.
Another problem with allocation policies is that there is too far from malloc()/new() to them. Chances are high, that malloc() got a memory from shared arena, that has no connection to a local node. If we definitely believe that allocation from local node is good thing, we can think about special implementation of NUMA node-aware malloc replacement.
Migration to a node local to the current thread can be also useful, as Aleksei mentioned, but this requires pointers and length, so hardly can be done via arena settings.