EGSnrc icon indicating copy to clipboard operation
EGSnrc copied to clipboard

EGSnrc parallelization using MPI

Open edoerner opened this issue 6 years ago • 21 comments

The purpose of this contribution is to present a parallel solution for the EGSnrc Monte Carlo code system using the MPI programming model as an alternative to the provided implementation, based on the use of a batch-queueing system.

For details of this implementation please check the README_MPI.md file available in the EGSnrc main folder.

Currently, the BEAMnrc and DOSXYZnrc user codes support this parallel solution based on MPI. In the case of DOSXYZnrc, both the use of a phase space file and BEAM shared library as sources has been tested. Both codes can be used as models to introduce MPI features to other EGSnrc user codes.

By default, it is expected that OpenMPI is installed in the system. If another MPI implementation is desired, change the F77 macro inside the 'mpi' target to the desired MPI compiler in the following Makefiles:

$HEN_HOUSE/makefiles/standard_makefile (line 169) $HEN_HOUSE/makefiles/beam_makefile (line 165)

by default F77=mpif90 in both files.

In order to compile an user code $(user_code) with MPI support go to the $EGS_HOME/$(user_code) folder and type:

make mpi

This will enable the MPI features in the user code and will create an executable called $(user_code)_mpi (i.e. the normal user code executable name with the '_mpi' suffix attached to it). Then, use mpirun or similar to execute it with MPI support:

mpirun -np #NUM_PROCS $(user_code)_mpi -i input_file -p pegs_file

edoerner avatar Jan 30 '19 21:01 edoerner

This is fantastic @edoerner! Similar to what I wished for. Question: is it possible, along the same lines, to write an MPI "wrapper" program that would simply dispatch N parallel jobs, wait for completion, and let the master thread combine everything at the end? Simply the same thing as exb dispatches jobs, but with MPI dispatch instead of a lock file. That would allow MPI execution across the board without modifying the code of existing applications. I think I will give it a try, based on what you did here!

ftessier avatar Jan 31 '19 16:01 ftessier

I'm noticing a few modifications to implicit none in some of the changes. Shouldn't we be leaving implicit none for all Fortran code?

MPI is added for all compiling. Does GCC Behave nicely if MPI is not installed?

crcrewso avatar Jan 31 '19 17:01 crcrewso

@crcrewso Yes!, I just realized that, it is just the change of implicit none by the macro $IMPLICIT-NONE that it is defined by default as implicit none, I really do not remember why I did that, but of course, any Fortran program should use implicit none as default

About MPI, any call to a MPI functionality is 'protected' by a preprocessor conditional valid only if the user defines the _MPI macro. Such macro is only defined if the user code (DOSXYZnrc or a BEAM accelerator) is compiled with the 'mpi' target. All the definitions needed (such as the use of 'mpif90' as compiler) are defined in this target inside the relevant Makefiles, in order to not affect the $EGS_CONFIG file and therefore the rest of the platform.

For example, to gain MPI functionality in the dosxyznrc user code I compile it typing 'make mpi' in its folder inside $EGS_HOME. This creates a dedicated executable 'dosxyznrc_mpi' and then I execute it through mpirun. The idea was to isolate the MPI functionality inside EGSnrc in order to not affect the rest of the platform, specially if MPI is not installed.

@ftessier It would be interesting to look on that to not have to modify the rest of the codes. It would be nice if you can give it a try.

edoerner avatar Jan 31 '19 19:01 edoerner

FYI I just updated my branch to the last version of the develop branch within nrc-cnrc. Tell me if you have comments or ideas about this implementation.

edoerner avatar Jun 07 '19 22:06 edoerner

Hi edoerner,

I am new at EGSnrc. I am helping post-docs to run EGSnrc on a Ubuntu 18.04 server with OpenMPI and with a GUI. I followed the instruction to install EGSnrc at https://github.com/nrc-cnrc/EGSnrc/wiki/Install-EGSnrc-on-Linux#1-install-prerequisite-software I followed Option 1 via the GUI. All went well. Icons were created on the desktop and each of them launch its application just fine. I have a question though and i hope you can help me.

  1. When using the GUI to launch an EGSnrc application, how can the user allocate all the cores of the server. I read your comment: "For example, to gain MPI functionality in the dosxyznrc user code I compile it typing 'make mpi' in its folder inside $EGS_HOME. This creates a dedicated executable 'dosxyznrc_mpi' and then I execute it through mpirun. The idea was to isolate the MPI functionality inside EGSnrc in order to not affect the rest of the platform, specially if MPI is not installed"._ and i understand the process but i am not sure or understand how to do the same with the GUI option.

Thank you for your help.

Best, Eric

ealemany avatar Oct 09 '19 19:10 ealemany

Hi @ealemany,

There's no option in the GUI at the moment to support this. It's good that you installed the GUIs as they will still be useful, but for running the jobs using MPI you'll have to launch them from the command line using the commands they suggested.

Also keep in mind that you will have to use 'git checkout' and direct it to grab this pull request, and run the installer after you have done so in order to compile these new MPI features.

rtownson avatar Oct 09 '19 19:10 rtownson

Hi @rtownson.

Thank you!

I am not sure if i understand your comment "Also keep in mind that you will have to use 'git checkout' and direct it to grab this pull request, and run the installer after you have done so in order to compile these new MPI features."

I understand the 'git checkout' concept but i don't understand the rest " and direct it to grab........
....MPI features"

Could you give me a command line example - if there is such a thing?

Thanks

ealemany avatar Oct 09 '19 22:10 ealemany

@ealemany here is an example. Doing this means you are using an experimental branch of EGSnrc, so you might want to re-install when the official 2020 release comes out early next year if this pull request has been included in it. The following fetches this PR into a new branch locally for you called feature-mpi-support, then checks it out. After this is when you should run the installation. It's always safest to do a clean install (delete/rename the EGSnrc directory); I can't guarantee that re-running the installer over the existing installation will work (but it might).

git fetch origin pull/511/head:feature-mpi-support
git checkout feature-mpi-support

rtownson avatar Oct 10 '19 11:10 rtownson

Great! Thank you @rtownson for the explanation and the example, very helpful and good to know

ealemany avatar Oct 11 '19 20:10 ealemany

Hello,

Are there specific steps to install EGSnrc for multiple users for the GUI option? I read through the instructions at https://github.com/nrc-cnrc/EGSnrc/wiki/Install-EGSnrc-on-Linux#1-install-prerequisite-software but i do not see any install configurations for multiple users.

i ran a test on a ubuntu server and i came across permissions issues for the short-cut icons on the desktop. To resolve this issue I created a group called "egsnrc", users joined the "egsnrc" group, I changed the permissions on the top directory (EGSnrc-2019) to 774 so users and egsnrc group can launch the short-cut icons on the desktop.

Is this the correct way to approach a multi user Desktop environment?

Thank you

ealemany avatar Oct 11 '19 20:10 ealemany

Hi @ealemany, I'm not sure what the optimal configuration is for a multiuser MPI setup. It's OK to share the whole EGSnrc directory, but usually users will each have their own egs_home directory, which is like the working area (all input and output files go here), and might share the HEN_HOUSE (all the source code is here). That means that each user would have their environment variables set with a different EGS_HOME. If you want to have more discussion about this we could do it on the reddit page, since this is not related to the implementation of this pull request.

rtownson avatar Oct 11 '19 21:10 rtownson

Hi @rtownson, I see you (talk to you) on the reddit page. Thanks

ealemany avatar Oct 12 '19 18:10 ealemany

Hi,

We installed and configured EGSnrc on a Master in a cluster of 12 nodes with OpenMPI. We ran our first job and it doesn't seem to be working. we follow the instructions as described above in @edoerner comment. we did "make mpi" and running something with "mpirun -np 20 BEAM_FLASH_SCAN_mpi -i wb_right_block.egsinp -p FLASH60MeV"

Is there something wrong in our mpirun command?

I hope this the right place to post this kind of issue.

Thank you for your help

Screen Shot 2019-11-01 at 3 34 49 PM

ealemany avatar Nov 01 '19 22:11 ealemany

hi @ealemany, I am not an expert in server configuration, but I suppose that you have 12 nodes , with each node having one or more cores. I think that the problem is essentialy the MPI configuration of your cluster.

I some systems I had the problem that mpi thinks that the system has less core than really available (for example, in a 8-core system I am not able to launch more than 4 MPI processes). In that case, I use the --oversubscribe flag for mpirun/mpiexec. Of course, you should check that effectively all the cores are used.

edoerner avatar Nov 12 '19 12:11 edoerner

Hi @edoerner, Thank you for your suggestion. You are right I might have some config issues with my OpenMPI. I will go over the configuration one more time and if it still give me the error message again, I will use the - -oversubscribe flag as you suggested.


Eric F. Alemany Systems Administrator for Research EXO Extended Operations

Stanford Medicine - Technology & Digital Services Stanford, California 94305

On Nov 12, 2019, at 4:39 AM, Edgardo Doerner <[email protected]mailto:[email protected]> wrote:

hi @ealemanyhttps://github.com/ealemany, I am not an expert in server configuration, but I suppose that you have 12 nodes , with each node having one or more cores. I think that the problem is essentialy the MPI configuration of your cluster.

I some systems I had the problem that mpi thinks that the system has less core than really available (for example, in a 8-core system I am not able to launch more than 4 MPI processes). In that case, I use the --oversubscribe flag for mpirun/mpiexec. Of course, you should check that effectively all the cores are used.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/nrc-cnrc/EGSnrc/pull/511?email_source=notifications&email_token=AIZHMOQANOSOWYWKO3JJZS3QTKPXTA5CNFSM4GTMH7EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED2DSDY#issuecomment-552876303, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIZHMOXXDUWWFIN4Y5HDDSTQTKPXTANCNFSM4GTMH7EA.

ealemany avatar Nov 12 '19 22:11 ealemany

This branch needs a good cleanup to synchronize with the current develop tip. @edoerner are you ok with me pushing the update to your feature-mpi-support branch to bring it in sync, and finally merge this into the EGSnrc trunk?

ftessier avatar Mar 25 '21 04:03 ftessier

Hi Frederic,

Sure!, push the update to start the process...

El jue., 25 de marzo de 2021 1:46 a. m., Frederic Tessier < @.***> escribió:

This branch needs a good cleanup to synchronize with the current develop tip. @edoerner https://github.com/edoerner are you ok with me pushing the update to your feature-mpi-support branch to bring it in sync, and finally merge this into the EGSnrc trunk?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nrc-cnrc/EGSnrc/pull/511#issuecomment-806360368, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5JMRRNHZJGCUVII6OONYLTFK53PANCNFSM4GTMH7EA .

edoerner avatar Mar 26 '21 01:03 edoerner

Rebased on develop, and fixed a few conflicts accrued over the last 3 years. Adjusted commit messages, removed EOL whitespace. This branch needs to be tested more thoroughly before merging.

ftessier avatar Jul 11 '22 19:07 ftessier

@ftessier and @edoerner: I'm just wondering if this has been tested on Mac. I've been trying to get "make mpi" to work, but I'm running into a strange "clang (LLVM option parsing): Unknown command line argument '-x86-pad-for-align=false'" error. Now, I'm using a wonky combination of gcc and clang, so I don't know if this error is just peculiar to my system. @ftessier, have you had a chance to try it on OSX?

blakewalters avatar Jul 15 '22 23:07 blakewalters

I have not tested it on macOS. This needs more testing, will be merged into develop just after the 2022 release.

ftessier avatar Jul 16 '22 00:07 ftessier

Hi to everyone!, @blakewalters as Frederic stated this solution has not been tested on MacOS. Unfortunatelly I have lost touch on the code since I left academia a couple of years ago, but back in the time I remember being able to use it in my iMac, but currently I do not have access to such OS to give it a try. @ftessier Although I am out of research since a couple of years it would be nice that this contribution be able to see the light, hehehe, do you need any input or help from my side?

I am glad to see that you are still looking on this, I always liked EGSnrc as my primary research tool :)

edoerner avatar Jul 20 '22 03:07 edoerner