McCode icon indicating copy to clipboard operation
McCode copied to clipboard

Colliding OpenMPI 4 / 5 behaviour and use of "auto" in mcgui / mcrun

Open willend opened this issue 7 months ago • 2 comments

Currently, mcgui and mcrun are shipped with the special default "auto" for the number of cores to run on. (which internally means to run mpirun without specifying -np #)

This results in a command line like this which for openmpi < 5 that automatically spreads to the "maximum" number of cores: mpirun ./PSI_DMC.out --ncount 100000000.0 --dir PSI_DMC_20240108_113430 --format McCode lambda=2.5666 R=0.87 R_curve=0.87 filename=Na2Ca3Al2F14.laz D_PHI=6 SHIFT=0 PACK=0.7 Dw=0.8 BARNS=1

For openmpi >= 5 the mpirun command fails if -np # is not set, and somehow --ncount is interpreted as input to mpirun, not the binary:

mpirun ./PSI_DMC.out --ncount 100000000.0 --dir PSI_DMC_20240108_113555 --format McCode lambda=2.5666 R=0.87 R_curve=0.87 filename=Na2Ca3Al2F14.laz D_PHI=6 SHIFT=0 PACK=0.7 Dw=0.8 BARNS=1
--------------------------------------------------------------------------
An unrecognized option was included on the mpirun command line:

  Option: --ncount

Please use the "mpirun --help" command to obtain a list of all
supported options.
--------------------------------------------------------------------------
INFO: call to mpirun failed with Command 'mpirun ./PSI_DMC.out --ncount 100000000.0 --dir PSI_DMC_20240108_113555 --format McCode lambda=2.5666 R=0.87 R_curve=0.87 filename=Na2Ca3Al2F14.laz D_PHI=6 SHIFT=0 PACK=0.7 Dw=0.8 BARNS=1' returned non-zero exit status 213.

(Interestingly, if -np auto or -np all is written into the mpirun call, the wanted behaviour is achieved with openmpi >=5 where as openmpi<5 fails....!)

The most robust behaviour is likely achieved if we again decide to set the / explicit number of cores... (4? 8? 12?)

Workaround: Simply set the default number of cores in your local mccode_config.json to the physical number of cores in your machine...

willend avatar Jan 08 '24 10:01 willend