phoebe2 icon indicating copy to clipboard operation
phoebe2 copied to clipboard

MPI: issues with multiprocessing using LAM compute model

Open jsinkbaek opened this issue 3 years ago • 2 comments

I encountered significant performance issues when computing a model from file using the LAM implementation of MPI. Running in a jupyter notebook with LAM installed in ubuntu 18.04 on a 12 thread processor, the following line resulted in a slow-down, rather than a speed-up: ! mpirun -np 11 python -m mpi4py compute_script.py

It took about 24 minutes, compared with 16 minutes and 41 seconds for the single-thread call b.run_compute()

I suspect the cause is that each process separately ran through the whole data-set of 7983 data-points, instead of splitting it between each other. The command-line output indicates this with 100%|██████████| 7983/7983

With MPICH installed instead of LAM, this issue was not encountered. The command line output with the same notebook command was instead 100%|████████████████████████████████████████| 726/726 [02:35<00:00, 4.66it/s] And the completion time is noted as a significant speed-up from single-thread performance. (a lot of numpy warnings were encountered for a separate thing, but Kyle mentioned that they were already aware of this)

With LAM installed, I got warnings after process termination that MPI_INIT and MPI_FINALIZE were not invoked, and as such MPI was not properly working. This might be related, so I thought I would mention it.

jsinkbaek avatar Jun 17 '21 21:06 jsinkbaek

Thanks for the bug report. Can you please provide us with the LAM MPI version?

aprsa avatar Jun 17 '21 21:06 aprsa

Is this helpful: liblam4/bionic,now 7.1.4-3.1build1 amd64

jsinkbaek avatar Jun 17 '21 22:06 jsinkbaek