freesurfer icon indicating copy to clipboard operation
freesurfer copied to clipboard

OpenMP thread synchronization bottleneck

Open mathdugre opened this issue 10 months ago • 1 comments

Environment details

Docker image: mathdugre/freesurfer:debug-info also available at https://github.com/mathdugre/mri-bottleneck/blob/main/container/freesurfer.Dockerfile

Multi-threading was set using:

  • -threads argument
  • OMP_NUM_THREADS env var
  • ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS env var

All set to the same value.

Issue

Profiling the recon-all pipeline, we found that when using multi-threading a majority of the time was spent waiting for OpenMP threads to synchronize. The two figure below show the CPU time spent in each function when using 1 and 32 threads respectively. Note the difference on the y-scale as well. hotspots-1threads-freesurfer-reconall-simple hotspots-32threads-freesurfer-reconall-simple

We found similar results when using lower number of threads. Furthermore, the parallel efficiency decrease significantly when increasing the number of threads. makespan-freesurfer

Potential Solution

We think this issue might arise from the OpenMP scheduling policy used; mostly static policy is used. We think that using dynamic policy might reduce the impact from threads synchronization. However, we couldn't test this hypothesis since naively replacing the OpenMP scheduling type failed to compile.

mathdugre avatar Apr 23 '24 14:04 mathdugre