parsec icon indicating copy to clipboard operation
parsec copied to clipboard

`profile=off` tasks are traced anyway, sometimes causing segfault

Open DSMishler opened this issue 1 year ago • 3 comments

Describe the bug

Run profile2h5 on the most recent version of dplasma and PaRSEC (as of April 25, 2023) and you will notice that it segfaults. With tracing on and with debug mode on, the issue becomes more clear. PaRSEC is tracing tasks that it was asked not to and sometimes this causes a segfault in parsec/parsec/profiling.c:1002 (in debug mode).

To Reproduce

compile parsec 518437965a5d788ed9d97ec7d5d9cae63001b328 and dplasma 518437965a5d788ed9d97ec7d5d9cae63001b328 with debug and tracing enabled. ./testing_dgemm -N 4800 -nruns 2

Fixes

According to @therault, there are at least 3 ways to go about it

1- always generate a task type, so task_profiler can profile any task it wants. This is somehow contradictory with the 'profile = off' programmer request. 2- generate a 'NON_PROFILED' task type, and assign this task type to any task that is marked as profile=off. task_profiler can decide to ignore those or not, and we will not get the extended infos for those tasks. 3- fix task_profiler to not trace a task that is not registered.

With the current proposed fix:

To do the fix 3-, everywhere in pins_task_profiler_module.c, where we call PARSEC_TASK_PROF_TRACE_FLAGS or PARSEC_PROFILING_TRACE on a key that is in task->taskpool->profiling_array, if the value of the key in task->taskpool->profiling_array is <= 0, do not call the profiling_trace function.

DSMishler avatar Apr 26 '23 18:04 DSMishler

task_profiler profiling tasks with profile=off seems incorrect to me, so that should be fixed. Having a default profiling type id seems like a good idea though, to catch anything that is somehow profiled, but that we don't have any other information about.

omor1 avatar Apr 27 '23 22:04 omor1

diff --git a/parsec/mca/pins/pins.c b/parsec/mca/pins/pins.c
index 6e1e7a6c6..f75c6fdc6 100644
--- a/parsec/mca/pins/pins.c
+++ b/parsec/mca/pins/pins.c
@@ -12,6 +12,8 @@
 #include "parsec/constants.h"
 #include "parsec/utils/debug.h"
 #include "parsec/execution_stream.h"
+#include "parsec/parsec_internal.h"
+#include "parsec/parsec_binary_profile.h"
 
 /**
  * Mask for PINS events that are enabled by default.
@@ -31,6 +33,10 @@ void parsec_pins_instrument(struct parsec_execution_stream_s* es,
                             parsec_task_t* task)
 {
     assert( method_flag < PARSEC_PINS_FLAG_COUNT );
+    if((NULL != task) && (-1 == task->taskpool->profiling_array[START_KEY(task->task_class->task_class_id)])) {
+        return;
+    }
+
 
     parsec_pins_next_callback_t* cb_event = &es->pins_events_cb[method_flag];
     while( NULL != cb_event->cb_func ) {
diff --git a/tests/profiling/async.jdf b/tests/profiling/async.jdf
index 487c2ea21..808074ba7 100644
--- a/tests/profiling/async.jdf
+++ b/tests/profiling/async.jdf
@@ -41,7 +41,7 @@ verbose          [ type = "int" ]
 taskqueue        [ type = "parsec_task_t **" ]
 NB               [ type = "int" ]
 
-STARTUP(s)
+STARTUP(s)  [profile=off]
 
 s = 0 .. 0
 
diff --git a/tests/profiling/check-async.py b/tests/profiling/check-async.py
index 2a1b2204a..109fdf856 100644
--- a/tests/profiling/check-async.py
+++ b/tests/profiling/check-async.py
@@ -31,8 +31,8 @@ except KeyError:
 error = 0
 
 try:
-    if len(t.events[t.events.type == STARTUP]) != 1:
-        print("Error: there should be exactly one STARTUP task.",
+    if len(t.events[t.events.type == STARTUP]) != 0:
+        print("Error: there should be no STARTUP task as they are marked for exclusion from the profiling (property \"profile=off\").\n",
               file=sys.stderr)
         error += 1
     else:

bosilca avatar Apr 28 '23 03:04 bosilca

Applying this patch in my local PaRSEC fork. There's still a segfault. Working with Thomas to figure out why. Will submit a PR when it is resolved.

DSMishler avatar Apr 28 '23 14:04 DSMishler