ompi
ompi copied to clipboard
Slurm resource detection issue
-np 144 jobs in last nights MTT with the latest prrte pointers are failing with:
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 144slots that were requested by
the application:
./c_hello
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided) 3. Resource manager (e.g., SLURM, PBS/Torque, LSF,
etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
Link here to MTT results.
I would advise adding --display alloc
to the cmd line to see what PRRTE thinks it was given.
Is somebody going to have a chance to look at this soon? I doubt the problem is with the Slurm allocation parser as that hasn't changed, so it is likely a bug down in the mapper. I can try to take a look here, but it would help if somebody added --prtemca rmaps_base_verbose 5
to one of those runs and sent me the output (or post it to a PRRTE issue so I can see it).
@rhc54 I'll try reproduce on my system
I think this was hit in our MTT, I can also give it a shot to reproduce
Any update on this?
Guys - I hate to release PRRTE v3.0 with this unresolved, but I have tried everything at my disposal to reproduce (testing under Slurm, faking RM allocations) with your cmd line without any failures. Minus some input from your environment, I have no choice but to declare this an unverifiable anomaly and move forward with the release.
So please - can someone just produce the requested debug so we can address this?
I cannot reproduce this on my system either - this happened on an AWS system, ideally it should be reproduced there
@wckzhang ???
@wckzhang is on vacation. @shijin-aws can you take a look at this?
will look at this today.
I can reproduce this issue, it is happening randomly. I am inside a salloc -n 144
and running the same test (alltoallv_somezeros
) in a for loop and this error is hit in random iteration numbers. In the log below iteration 1,2,3 failed, 4,5 succeeded.
(env) (env) bash-4.2$ for i in $(seq 1 5); do echo "iteration $i: mpirun -n 144 collective/alltoallv_somezeros"; /home/ec2-user/mtt-scratch/installs/wogy/install/bin/mpirun -n 144 /home/ec2-user/mtt-scratch/installs/wogy/tests/ibm/ibm/collective/alltoallv_somezeros; done
iteration 1: mpirun -n 144 collective/alltoallv_somezeros
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 144
slots that were requested by the application:
/home/ec2-user/mtt-scratch/installs/wogy/tests/ibm/ibm/collective/alltoallv_somezeros
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
iteration 2: mpirun -n 144 collective/alltoallv_somezeros
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 144
slots that were requested by the application:
/home/ec2-user/mtt-scratch/installs/wogy/tests/ibm/ibm/collective/alltoallv_somezeros
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
iteration 3: mpirun -n 144 collective/alltoallv_somezeros
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 144
slots that were requested by the application:
/home/ec2-user/mtt-scratch/installs/wogy/tests/ibm/ibm/collective/alltoallv_somezeros
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
iteration 4: mpirun -n 144 collective/alltoallv_somezeros
No Errors
iteration 5: mpirun -n 144 collective/alltoallv_somezeros
No Errors
added --display alloc
and the log is
(env) (env) bash-4.2$ for i in $(seq 1 5); do echo "iteration $i: mpirun -n 144 collective/alltoallv_somezeros"; /home/ec2-user/mtt-scratch/installs/wogy/install/bin/mpirun -n 144 --display alloc /home/ec2-user/mtt-scratch/installs/wogy/tests/ibm/ibm/collective/alltoallv_somezeros; done
iteration 1: mpirun -n 144 collective/alltoallv_somezeros
====================== ALLOCATED NODES ======================
queue-c5n18xlarge-dy-c5n18xlarge-1: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.2.155
queue-c5n18xlarge-dy-c5n18xlarge-2: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.9.252
queue-c5n18xlarge-dy-c5n18xlarge-3: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.15.79
queue-c5n18xlarge-dy-c5n18xlarge-4: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.5.159
=================================================================
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 144
slots that were requested by the application:
/home/ec2-user/mtt-scratch/installs/wogy/tests/ibm/ibm/collective/alltoallv_somezeros
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
iteration 2: mpirun -n 144 collective/alltoallv_somezeros
====================== ALLOCATED NODES ======================
queue-c5n18xlarge-dy-c5n18xlarge-1: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.2.155
queue-c5n18xlarge-dy-c5n18xlarge-2: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.9.252
queue-c5n18xlarge-dy-c5n18xlarge-3: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.15.79
queue-c5n18xlarge-dy-c5n18xlarge-4: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.5.159
=================================================================
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 144
slots that were requested by the application:
/home/ec2-user/mtt-scratch/installs/wogy/tests/ibm/ibm/collective/alltoallv_somezeros
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
iteration 3: mpirun -n 144 collective/alltoallv_somezeros
====================== ALLOCATED NODES ======================
queue-c5n18xlarge-dy-c5n18xlarge-1: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.2.155
queue-c5n18xlarge-dy-c5n18xlarge-2: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.9.252
queue-c5n18xlarge-dy-c5n18xlarge-3: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.15.79
queue-c5n18xlarge-dy-c5n18xlarge-4: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.5.159
=================================================================
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 144
slots that were requested by the application:
/home/ec2-user/mtt-scratch/installs/wogy/tests/ibm/ibm/collective/alltoallv_somezeros
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
iteration 4: mpirun -n 144 collective/alltoallv_somezeros
====================== ALLOCATED NODES ======================
queue-c5n18xlarge-dy-c5n18xlarge-1: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.2.155
queue-c5n18xlarge-dy-c5n18xlarge-2: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.9.252
queue-c5n18xlarge-dy-c5n18xlarge-3: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.15.79
queue-c5n18xlarge-dy-c5n18xlarge-4: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.5.159
=================================================================
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 144
slots that were requested by the application:
/home/ec2-user/mtt-scratch/installs/wogy/tests/ibm/ibm/collective/alltoallv_somezeros
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
iteration 5: mpirun -n 144 collective/alltoallv_somezeros
====================== ALLOCATED NODES ======================
queue-c5n18xlarge-dy-c5n18xlarge-1: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.2.155
queue-c5n18xlarge-dy-c5n18xlarge-2: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.9.252
queue-c5n18xlarge-dy-c5n18xlarge-3: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.15.79
queue-c5n18xlarge-dy-c5n18xlarge-4: slots=36 max_slots=0 slots_inuse=0 state=UP
Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
aliases: 172.31.5.159
=================================================================
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 144
slots that were requested by the application:
/home/ec2-user/mtt-scratch/installs/wogy/tests/ibm/ibm/collective/alltoallv_somezeros
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the PRRTE term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which PRRTE processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, PRRTE defaults to the number of processor cores
In all the above cases, if you want PRRTE to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
@rhc54 I attached the log that run with --display alloc --prtemca rmaps_base_verbose 5
as you suggested
Well, that last log shows me what is going on (two of the nodes are being dropped for some reason) - now I just have to figure out why! Might need some more debug, so I may be back.
Thanks @rhc54 . Were you able to root cause?
Haven't gotten there yet. What I saw in a quick scan is that two of the nodes were being skipped when we assemble the node list for mapping. I don't have any immediate idea as to why that happened. Once I get the remaining cmd line issues resolved (hopefully today), I plan to come back and look at this one. Probably have to add some verbose/debug code and ask for it to be re-run.
Crud - I'm dense. I don't need you to make more measurements - this is happening in the mapping phase. All I need is for someone to post the xml lstopo output from one of those nodes.
Can someone do that please?
@rhc54 I can do that, may I know what command should I run to get that?
I believe it is simply lstopo --of xml > file
Rats - it works perfectly for me with that topology, so it has to be something else. Can you please update the openpmix and prrte submodules to head of their master branches, rebuild, and then rerun the cmd line with --display alloc --prtemca rmaps_base_verbose 5
?
@rhc54 if I build ompi from github main source (with updated submodules). There is no such issue. But if I use the latest nightly main tarball https://download.open-mpi.org/nightly/open-mpi/main/openmpi-main-202208250241-96fadd9.tar.bz2, I can reproduce this issue
I build github ompi main source as
git clone https://github.com/openmpi/ompi ompi-main
cd ompi-main
git submodule update --recursive --init
./configure CFLAGS=-pipe --enable-picky --enable-debug --without-verbs --with-ofi=/opt/amazon/efa/ --enable-mpi1-compatibility --prefix=/home/ec2-user/ompi-main/install --disable-man-pages
make -j install
I build the nightly tarball as
./configure CFLAGS=-pipe --enable-picky --enable-debug --without-verbs --with-ofi=/opt/amazon/efa/ --enable-mpi1-compatibility --prefix=/home/ec2-user/openmpi-main-202208250241-96fadd9/install
I am not sure if there is a difference on the pmix/prrte between the latest nightly tarball and the github main. A naive diff
returns me a lot of difference...
I will try to copy the pmix/prrte to the repo of nightly tarball to see if it can fix the issue.
You'll probably need to re-run autogen.pl once you copy them over since pmix/prrte will be coming from a git repo and not a tarball, but it should otherwise be okay. Afraid I don't know how old the pmix/prrte code is in the nightly tarball - could be fairly old.
After copying and ./autogen.pl
, I hit this error during configure
============================================================================
== Configure PMIx
============================================================================
checking --with-pmix value... not found
configure: WARNING: Expected file /usr/include/pmix.h not found
configure: error: Cannot continue
configure: ===== done with 3rd-party/prrte configure =====
configure: error: PRRTE configuration failed. Cannot continue.
The latest github main commit is on Aug 24th, I thought the nightly main tarball on Aug 25 should be the same with the main branch. But I do not know how the tarball is generated.
How about this: do a git clone
of OMPI, then do the submodule init. Go into the 3rd-party/openpmix and 3rd-party/prrte and in each one do git checkout master; git pull
Then just build OMPI as usual.
Oh, sorry - that's what you already did and it worked fine, yes? If so, then aren't we done? It's the tarball that is having the problem.
Oh, sorry - that's what you already did and it worked fine, yes? If so, then aren't we done? It's the tarball that is having the problem.
Yes, I even did not bother checking out prrte and openpmix to master, I just use whatever is bumped to ompi main.
Okay - the commit history indicates that the PMIx/PRRTE submodule pointers were last updated on Aug 24. I'm guessing your nightly tarball was from before that date? If so, that would explain the difference.
This nightly tarball https://download.open-mpi.org/nightly/open-mpi/main/openmpi-main-202208250241-96fadd9.tar.bz2 indicates it's generated on Aug 25 but I can still hit the issue with it.