freesurfer
freesurfer copied to clipboard
OpenMP thread synchronization bottleneck
Environment details
Docker image: mathdugre/freesurfer:debug-info
also available at https://github.com/mathdugre/mri-bottleneck/blob/main/container/freesurfer.Dockerfile
Multi-threading was set using:
-
-threads
argument -
OMP_NUM_THREADS
env var -
ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS
env var
All set to the same value.
Issue
Profiling the recon-all
pipeline, we found that when using multi-threading a majority of the time was spent waiting for OpenMP threads to synchronize. The two figure below show the CPU time spent in each function when using 1 and 32 threads respectively.
Note the difference on the y-scale as well.
We found similar results when using lower number of threads. Furthermore, the parallel efficiency decrease significantly when increasing the number of threads.
Potential Solution
We think this issue might arise from the OpenMP scheduling policy used; mostly static policy is used. We think that using dynamic policy might reduce the impact from threads synchronization. However, we couldn't test this hypothesis since naively replacing the OpenMP scheduling type failed to compile.