likwid
likwid copied to clipboard
behavior of likwid-mpirun -np option
If I run likwid-mpirun
with the -np
option nothing seems to happen. I've traced it down to the elseif block around line 1857. It seems as though ppn
isn't getting set to a value the program finds reasonable and I'm struggling to understand the logic in the code:
if ppn == 0 then
ppn = 1
end
if ppn > maxppn and np > maxppn then
ppn = maxppn
elseif np < maxppn then
ppn = np
elseif maxppn == np then
ppn = maxppn
end
Is it possible that
if ppn > maxppn and np > maxppn then
ought to be this?
if ppn > maxppn and np > maxnp then
I'm confused about the other elsif
statements there. Could you help me understand the intent there?
Thanks!
-Aaron
I forgot to add, I'm experiencing this using the -g
option. -np
works without the g
option however it does print this warning (because ppn is initially set to 0):
WARN: Processes cannot be equally distributed
WARN: You want 96 processes on 4 hosts with 1 per host.
WARN: Sanitizing number of processes per node to 24
I'm thinking it ought to infer a default value of ppn
based on the information provided by the scheduler. This is what I was trying to implement earlier but I didn't feel I could without understanding what that block is supposed to do.
Hi Aaron,
I'm currently not able to test it because I'm on my way back from a LIKWID tutorial and the internet connection in trains isn't stable. I added some comments to the if statements, I hope this clarifies what is happening here.
-- If available processes per node (ppn) are larger than the available slots on the hosts (ppn > maxppn) and the total processes require multiple hosts (np > maxppn), sanitize ppn value to the slots available on the hosts.
if ppn > maxppn and np > maxppn then
ppn = maxppn
-- if all processes fit on a single host, use only a single host with np processes
elseif np < maxppn then
ppn = np
-- if the processes fit exactly on the host, use all slots. It should be able to change the previous elseif into np <= maxppn
elseif maxppn == np then
ppn = maxppn
end
I'll check likwid-mpirun
again when I'm back at the office. When I remember it correctly, not all job scheduler provide information how many processes should be run on each node, so I had to determine them like this. SLURM has a environment variable for that, so likwid-mpirun
should use it.
Thanks @TomTheBear! That's exactly what I needed. I'll keep looking at this too and let you know what I come up with.
@TomTheBear I'm finally getting back to working on likwid-mpirun and making it work for us with SLURM. The issue I'm running into is, I think, the amount of information likwid-mpirun needs to understand currently such as how to interact with schedulers and mpi implementation combination to launch tasks in the desired layout.
For example, if I have an asymmetric allocation (someone requests an odd number of tasks and say 28 end up on the first node and 27 on another) then this doesn't wok:
likwid-mpirun -g ENERGY -np 55 ./mpi_hello
I get an error about processes being unequally distributed and it also seems to think I'm asking for 1 task per host.
I wonder if rather than having likwid-mpirun have to understand the subtleties of each scheduler and mpi combination enough to launch a job of a given layout if we couldn't instead interpose likwid-mpirun (or create another script) between the mpi implementation and the mpi task itself, e.g.
srun likwid-mpi -g ENERGY -nperdomain S:14 ./mpi_hello
then all likwid-mpi would need to understand is how to identify its relative rank in the job (e.g. usually reading some environment variables) rather than having to understand how to launch a desired layout. It becomes the user's responsibility to launch the tasks properly. This is similar, I think, to how some of SGI's placement tools seem to work (https://www.nas.nasa.gov/hecc/support/kb/using-sgi-omplace-for-pinning_287.html) as well as Intel VTune (https://software.intel.com/en-us/node/544016).
Then the question becomes how to capture output and summarize it the way likwid-mpirun does. Perhaps likwid-mpi would require one to specify a results directory (that in the batch script could be made unique) e.g.:
srun likwid-mpi -r <results_dir> -g ENERGY -nperdomain S:14 ./mpi_hello
which after the fact could be summarized with the logic from liwid-mpirun using a separate tool (perhaps called likwid-mpi-report?).
I'm willing to implement this, but I'd like your concurrence before I do anything :)
-Aaron
@TomTheBear just wondering if you've had a chance to think this over?
Hi, I have been thinking about this, yes. I fully understand your approach and basically I don't have a problem when we split the scripts in multiple parts.
What I didn't get is how you want to do the pinning of MPI processes and maybe threads? If you call srun likwid-mpi ...
, do you have the full power to manage the pinning/amount of processes started/...? The other tools don't seem to handle the distribution of MPI processes to hosts. It took me quite some time to set up the SLURM support in likwid-mpirun
and the implemented support is the only one I could find covering all features. If we have full power on that, we can do it as you proposed.
If not, I would suggest to keep likwid-mpirun
but strip it down to do only the pinning/node selection and forward all further options to an interceptor script likwid-mpi
which does the likwid-perfctr
and remaining pinning stuff. Futhermore, likwid-mpirun
can be extended to take a folder to report the measurements, if a user uses likwid-mpi
directly.
This won't be part of the upcoming release.
But basically, that's:
srun likwid-perfctr -o /tmp/output_%h_%r.txt -g X <exec>
And then a script that reads all output files (the final step of likwid-mpirun). %h and %r are substituted with hostname and MPIrank.