sum_hills doesn't give any speedup if used with OpenMP and/or MPI
I have a bunch of HILLS.* files and want to speed up integration with plumed sum_hills tool. However, neither MPI version of PLUMED nor OpenMP parallelization (with setting export PLUMED_NUM_THREADS=4 explicitly) gives me any speedup during integration. Here are outputs from different integration modes and some benchmarks. All calculations are done in tmpfs to eliminate possible latencies because of storage hardware, however I've tested it on HDD, SATA SSD and NvME SSD devices and all the time I get the same behavior.
| Type | Log file | Number of threads | Number of MPI instances | User time (s) |
|---|---|---|---|---|
| Simple | simple.txt |
1 | - | 30.2 |
| MPI | mpi.txt |
1 | 4 | 124.7 |
| Simple with OpenMP | simple_omp.txt |
4 | - | 29.3 |
| MPI with OpenMP | mpi_omp.txt |
4 | 4 | 123.5 |
Please note that in case of MPI version of sum_hills tool I have multiple of output in console and 4 files in directory after the end of integration - fes.dat and 3 backups of it
Does sum_hills tool support any type of parallelization technique? How can I benefit from parallel integration of my deposited Gaussians?