argobots
argobots copied to clipboard
unnamed threads don't appear to be freed, at least not in a timely manner.
I'm using v1.1. To illustrate, I made this patch to the hello_world example. If I run with a large number of ULTs, the memory grows very fast. Since it is executing hello_world, I would expect it to be a much smaller memory footprint. Note, I also changed the stacksize, not sure if that has anything to do with the problem but it matches an issue we see with memory growth in DAOS
index 822b10f..8b988be 100644
--- a/examples/hello_world/hello_world.c
+++ b/examples/hello_world/hello_world.c
@@ -81,19 +81,23 @@ int main(int argc, char **argv)
for (i = 0; i < num_xstreams; i++) {
ABT_xstream_get_main_pools(xstreams[i], 1, &pools[i]);
}
+ABT_thread_attr attr;
+ABT_thread_attr_create(&attr);
+
+ABT_thread_attr_set_stacksize(attr, 128*1024);
/* Create ULTs. */
for (i = 0; i < num_threads; i++) {
+ int rc1;
int pool_id = i % num_xstreams;
+
thread_args[i].tid = i;
ABT_thread_create(pools[pool_id], hello_world, &thread_args[i],
- ABT_THREAD_ATTR_NULL, &threads[i]);
+ attr, NULL);
}
/* Join and free ULTs. */
- for (i = 0; i < num_threads; i++) {
- ABT_thread_free(&threads[i]);
- }
+ ABT_thread_attr_free(&attr);
/* Join and free secondary execution streams. */
for (i = 1; i < num_xstreams; i++) {```
This may just be a documentation issue of ABT_thread_exit is required to free the ULT.
Hmm, if I add an ABT_thread_yield, in the thread creation loop, it is much better. So I suppose it's not a leak but may indicate a peak memory usage issue.
@jolivier23 just curious, does the memory consumption change if you set the ABT_MEM_MAX_NUM_STACKS environment variable?
FWIW the default value of this parameter is 65536, but in Margo we set it to a much lower default value (8) to prevent Argobots from accumulating too many unused stacks for potential reuse. Although now that I look back at it I'm wondering if we need to consider if that value is having any other side effects.
At any rate, I'm curious if that's a factor in the behavior here.
@carns For the example above, it makes no difference what I set that to. We can try in DAOS and see if it makes any difference.
I'm trying to understand the behavior of this particular example. If I add ABT_thread_yield, the problem essentially goes away in that test case. My assumption is that the primary ULT is creating the threads to execute on the other xstreams and is the party responsible for freeing them but never gets a chance to do so until the ABT_thread_yield or some other yield point.
If this is the case, perhaps we should be at least checking a condition on create and potentially cleaning up old threads before creating new ones?
It seems like this may be a red herring for the DAOS problem but seems like an issue nonetheless.
@jolivier23 Found the code to free memory for unnamed threads automatically. No call to ABT_thread_exit() required. https://github.com/pmodels/argobots/blob/6d216a9f546f0a0024a4fe0e8548aeaf51520d4a/src/include/abti_thread.h#L143
@carns in looking at the code, it looks like the max is 1024 now
#define ABTD_MEM_MAX_NUM_STACKS 1024
...
const uint32_t default_mem_max_stacks =
ABTU_min_uint32(ABTD_MEM_MAX_TOTAL_STACK_SIZE /
p_global->thread_stacksize,
ABTD_MEM_MAX_NUM_STACKS);
65536 could probably explain how much memory is used during rebuild in daos but 1024 does not.
@carns in looking at the code, it looks like the max is 1024 now
#define ABTD_MEM_MAX_NUM_STACKS 1024 ... const uint32_t default_mem_max_stacks = ABTU_min_uint32(ABTD_MEM_MAX_TOTAL_STACK_SIZE / p_global->thread_stacksize, ABTD_MEM_MAX_NUM_STACKS);
65536 could probably explain how much memory is used during rebuild in daos but 1024 does not.
Ah, right. I forgot it was tuned down. Re: your other comment I'm not sure who's responsible for the free (though I do know it eventually happens; we've had similar patterns with no runaway memory usage), so I'm not sure what's needed to encourage it to happen faster...