parsec icon indicating copy to clipboard operation
parsec copied to clipboard

Concurrent Taskpool registration results in conflicting writes to global profiling properties dictionary

Open omor1 opened this issue 2 years ago • 0 comments

Describe the bug

As part of registering a taskpool with the runtime for execution (parsec_context_add_taskpool), when profiling is enabled information about the taskpool is added to the global profiling properties dictionary, in parsec_profiling_add_taskpool_properties. Access to the global structure is not protected by locks or other mechanisms and are not per-taskpool.

In particular, parsec_profiling_tree_reload_buckets will write the name of the taskpool/task class/property to dict->tree->first_nodes[pos]->str without checking to ensure that str contains sufficient space for the supplied name. These writes can conflict with both each-other and with calls to the new_bucket function pointer (create_{ns|tc|pr}_bucket), which contains code like:

char *str = (char*)calloc(strlen(node->str)+1, sizeof(char));
sprintf(str, "%s", node->str);

When node->str is written (by a different thread) after strlen is evaluated, but before sprintf (which at least my compiler optimizes to strcpy in this case), then this sequencing can result in a buffer overflow of str.

The most frequent way of encountering concurrent taskpool registration are recursive taskpools. I have particularly encountered this when running HiCMA with a very large dense band—many GEMMs on the diagonal can be executed in parallel and each will launch a recursive taskpool.

To Reproduce

Steps to reproduce the behavior:

  1. Compile PaRSEC with Address Sanitizer enabled, -fsanitize=address
  2. Run an application launching (many) concurrent taskpools, e.g. DPLASMA with large tile sizes (to force the algorithm to execute recursively)
  3. Observe buffer overflow errors in create_{ns|tc|pr}_bucket.

Note that 68ada9e23dc5242e72c6575fa7f258f5f92dbbc7 changed some details in parsec_context_add_taskpool; in particular, there's a new critical region before parsec_profiling_add_taskpool_properties is called that makes it more likely for execution to be serialized and this behavior not to be observed; nonetheless, it can possible happen.

Expected behavior

PaRSEC should not cause buffer overflows. A temporary solution that would at least ensure that the str buffer in create_{ns|tc|pr}_bucket is not overflowed looks something like:

size_t len = strlen(node->str);
char *str = malloc(len+1);
memcpy(str, node->str, len);
str[len] = '\0';

This prevents the TOCTOU race on the length of node->str, but does nothing to prevent different taskpools from mixing their information in the global profiling properties dictionary; to resolve this, the simplest solution is to add a lock around modifying the dictionary.

More broadly, the entire design of global profiling properties is flawed in the face of multiple taskpools executing concurrently. I don't have a direct proposal of how to fix this.

Additional context

Sample stack trace from Address Sanitizer:

    #0 0x2fec87 in strcpy
    #1 0x15554d38f11e in create_tc_bucket parsec/dictionary.c:186:5
    #2 0x15554d38e6e0 in parsec_profiling_tree_add_missing_buckets parsec/dictionary.c:333:13
    #3 0x15554d38dd09 in parsec_profiling_add_taskpool_properties parsec/dictionary.c:867:78
    #4 0x15554d39841f in parsec_context_add_taskpool parsec/scheduling.c:632:5
    #5 0x38587c in parsec_recursivecall parsec/recursive.h:73:5
    #6 0x39855b in hook_of_HiCMA_dpotrf_L_2flow_potrf_dtrsm_RECURSIVE hicma_parsec/HiCMA_dpotrf_L_2flow_wrapper.c:69:12
    #8 0x15554d396958 in __parsec_execute parsec/scheduling.c:173:14
    #9 0x15554d397742 in __parsec_task_progress parsec/scheduling.c:436:18
    #10 0x15554d397dd7 in __parsec_context_wait parsec/scheduling.c:565:18
    #11 0x15554d38232a in __parsec_thread_init parsec/parsec.c:298:30
    #12 0x15554cfdf179 in start_thread (/lib64/libpthread.so.0+0x8179)
    #13 0x15554c8eedf2 in clone (/lib64/libc.so.6+0xfcdf2)

omor1 avatar Jan 17 '23 21:01 omor1