ompi icon indicating copy to clipboard operation
ompi copied to clipboard

why mpirun -n is not work (only support 1 process) through ssh connection?

Open wxfred opened this issue 2 years ago • 1 comments

Details of the problem

I'm running 8 mpi nodes on one server with the command mpirun -n 8 mympiexecutable. If I type the command on PuTTY, it works. But if I execute the command through node-ssh or paramiko (a python ssh tool) or WinSCP's ssh shell, the same error occurred:

Abort(xxxxxxxxxx) on node 0 (rank 0 in comm 0): Fatal error in internal_Send: Invalid rank, error stack:
internal_Send(xxx): MPI_Send(buf=xxxxxxxxxxxx, count=1, MPI_INT, 1, 0, MPI_COMM_WORLD) failed
internal_Send(xx).: Invalid rank has value 1 but must be nonnegative and less than 1

Then, I changed my code, only use 1 mpi node, and execute mpirun -n 1 mympiexecutable, no error appear.

Another experiment, in my code use 8 mpi nodes, but execute mpirun -n 1 mympiexecutable, the error appeared again.

Invalid rank has value 1 but must be nonnegative and less than 1

It seems the argument -n is not work through those common ssh tools except PuTTY.

If there are some configuration need to be preset on common ssh tools?

Need some help, pls.

wxfred avatar Aug 10 '22 07:08 wxfred

I'm afraid I do not understand the environment in which you're operating, or what you're trying to do. Can you explain further, and/or provide a recipe for reproducing the issue?

Also, can you supply the information that was requested in the github issue template? See https://github.com/open-mpi/ompi/blob/main/.github/issue_template.md.

jsquyres avatar Aug 10 '22 14:08 jsquyres

Background information Open MPI version: 4.0 Installed from: tarball Server operating system/version: Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-122-generic x86_64) Computer hardware: Intel® Xeon® Gold 6136 Processor 12Cores 24Threads 3.0GHz Network type: LAN Client operation system: Win10

Details of the problem Steps to reproduce the problem: 1.Build a hello world mpi executable on server - Server side source code: mpi_hello_world.c mpicc -o mpi_hello_world mpi_hello_world.c

2.Execute the executable through PuTTY with the command mpirun -n 8 ./mpi_hello_world, the output is - Client side

Hello world from processor xxx, rank 0 out of 8 processors
Hello world from processor xxx, rank 1 out of 8 processors
Hello world from processor xxx, rank 2 out of 8 processors
Hello world from processor xxx, rank 4 out of 8 processors
Hello world from processor xxx, rank 5 out of 8 processors
Hello world from processor xxx, rank 6 out of 8 processors
Hello world from processor xxx, rank 3 out of 8 processors
Hello world from processor xxx, rank 7 out of 8 processors

8 processors, correct.

3.Write a python script with paramiko installed, - Client side

import paramiko

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(YourServerIP, username=YourUsername, port=22, password=YourPassword)
ssh_stdin, ssh_stdout, ssh_stderr = ssh.exec_command('cd YourExecutableDir; mpirun -n 8 ./mpi_hello_world')
print(ssh_stdout.read(), ssh_stderr.read())
ssh.close()

the result is:

Hello world from processor xxx, rank 0 out of 1 processors
Hello world from processor xxx, rank 0 out of 1 processors
Hello world from processor xxx, rank 0 out of 1 processors
Hello world from processor xxx, rank 0 out of 1 processors
Hello world from processor xxx, rank 0 out of 1 processors
Hello world from processor xxx, rank 0 out of 1 processors
Hello world from processor xxx, rank 0 out of 1 processors
Hello world from processor xxx, rank 0 out of 1 processors

only 1 processor is found, the same as run it on node-ssh or WinSCP's shell.

I'm trying to develop a nodejs app to invoke server's mpi executable by node-ssh. The problem is, when I execute the command mpirun -n 8 ./mpi_hello_world, only 1 processor is found, but, if I run this command on PuTTY, it will be fine.

I don't know what's the difference between those ssh tools? Why openmpi has different appearence with the same command?

wxfred avatar Aug 11 '22 03:08 wxfred

That typically occurs when you use MPICH's mpirun and your app uses Open MPI libmpi.so (or the other way around).

try using the absolute path to mpirun in your script, and see how it goes.

for debugging purpose, you can type mpirun in both terminal and your script.

ggouaillardet avatar Aug 11 '22 03:08 ggouaillardet

That typically occurs when you use MPICH's mpirun and your app uses Open MPI libmpi.so (or the other way around).

try using the absolute path to mpirun in your script, and see how it goes.

for debugging purpose, you can type mpirun in both terminal and your script.

Thanks! I found the environment paths are different, and both the openmpi and the mpich are installed on my server. The which mpirun command shows that PuTTY uses /usr/share/mpich-4.0/bin/mpirun, while the others use /usr/bin/mpiexec.

wxfred avatar Aug 11 '22 06:08 wxfred