phyml icon indicating copy to clipboard operation
phyml copied to clipboard

issue with mpirun after updating to Ubuntu 22

Open fermza opened this issue 2 years ago • 21 comments

Dear Stephane,

after updating two PCs from Ubuntu 18 to Ubuntu 22.04, PhyML stopped to work in both. No matter the command, it stops with message:

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[20043,1],7]
  Exit code:    1

I tried to uninstal and reinstall both phyml and openmpi-bin packages, but it didn't help. I'm stuck since I haven't found a solution yet. Any pointers or guidance you can provide will be greatly appreciated.

I noted that in Ubuntu 22 the PhyML version is different from the one I have in other computers with older Ubuntu. Incidentally, the error also pops when I try with "phyml --version":

pc10@pc10:~/Desktop/running$ phyml --version


. Running the analysis on 8 CPUs..
. This is PhyML version 3.3.3:3.3.20211231-1.

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[19216,1],2]
  Exit code:    1
--------------------------------------------------------------------------

fermza avatar Nov 24 '22 21:11 fermza

Update: I removed PhyML from the computer and resinstall via source. After running sh ./autogen.sh followed by configure --enable-phyml-mpi, all good. Then, after make, error pops:

~/phyml-3.3.20220408 make
make  all-recursive
make[1]: Entering directory '/home/fer/phyml-3.3.20220408'
Making all in src
make[2]: Entering directory '/home/fer/phyml-3.3.20220408/src'


.:  Building [phyml-mpi]. Version 3.3.20220408 :.


mpicc  -I. -I..     -std=c99 -O3 -fomit-frame-pointer -funroll-loops -Wall -Winline -finline -march=native -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c
In file included from utilities.h:2543,
                 from spr.h:12,
                 from main.c:13:
mpi_boot.h:18:10: fatal error: mpi.h: No such file or directory
   18 | #include "mpi.h"
      |          ^~~~~~~
compilation terminated.
make[2]: *** [Makefile:1216: main.o] Error 1
make[2]: Leaving directory '/home/fer/phyml-3.3.20220408/src'
make[1]: *** [Makefile:365: all-recursive] Error 1
make[1]: Leaving directory '/home/fer/phyml-3.3.20220408'
make: *** [Makefile:306: all] Error 2

Not sure if it help with the diagnostics, but this is the situation.

fermza avatar Nov 25 '22 16:11 fermza

Hi there. On my Linux box (Ubuntu 20.04), I have a file /usr/include/x86_64-linux-gnu/mpich/mpi.h which seems to be missing on your side. I'll try to upgrade my OS this afternoon to see if I can reproduce this issue.

stephaneguindon avatar Nov 28 '22 11:11 stephaneguindon

Hi Stephane, I have tried to find something about the missing mpi.h file, but I couldn't find any fix yet. Any luck on your end?

fermza avatar Dec 01 '22 12:12 fermza

Yes : sudo apt-get purge mpich sudo apt-get install mpich did the trick for me.

stephaneguindon avatar Dec 04 '22 07:12 stephaneguindon

Hi Stephane, didn't work on my end (I tried on a couple of computers with the same issue, and it didn't fix it in neither). Thanks anyway, I will try to keep looking for a solution.

fermza avatar Dec 07 '22 17:12 fermza

Well, that's too bad. Please keep me updated on your progress as this issue will likely impact other users...

stephaneguindon avatar Dec 08 '22 14:12 stephaneguindon

Will do! As soon as I get something I let you know.

fermza avatar Dec 16 '22 16:12 fermza

Hi @fermza @stephaneguindon, I'm also encountering the same problem. I tried both on Ubuntu 22 and 18, but I keep getting the same error. I was wondering if by any chance you have had a quick fix to address this issue? Thanks!

papelypluma avatar Jan 16 '23 16:01 papelypluma

Hi Delbert, so far we haven't found a solution. Another observation we made is that it's possible to run small datasets (few seqs and columns). Our guess is that if the dataset is small enough, PhyML won't parallelize (thus not using mpirun?), therefore it runs... but I am not sure of this. We are still digging for a permanent solution.

Fernando

On Mon, Jan 16, 2023, 1:06 PM Delbert @.***> wrote:

Hi @fermza https://github.com/fermza @stephaneguindon https://github.com/stephaneguindon, I'm also encountering the same problem. I tried both on Ubuntu 22 and 18, but I keep getting the same error. I was wondering if by any chance you have had a quick fix to address this issue? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/stephaneguindon/phyml/issues/179#issuecomment-1384257549, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK537YX6LQI6Q4F56Z732KDWSVWZ3ANCNFSM6AAAAAASKYCYQE . You are receiving this because you were mentioned.Message ID: @.***>

fermza avatar Jan 17 '23 12:01 fermza

Thanks for the suggestion @fermza. Will consider that as a roundabout for the meantime.

papelypluma avatar Jan 18 '23 06:01 papelypluma

Can try locate mpi.h in a terminal and post the result please?

stephaneguindon avatar Jan 18 '23 09:01 stephaneguindon

Hi @stephaneguindon, here it is:

pc10@pc10:/$ sudo find . -print | grep -w 'mpi[.]h'
[sudo] password for pc10: 
./usr/src/linux-headers-5.15.0-58/include/linux/mpi.h
./usr/src/linux-headers-5.15.0-56/include/linux/mpi.h
./usr/lib/x86_64-linux-gnu/fortran/gfortran-mod-15/openmpi/mpi.h
./usr/lib/x86_64-linux-gnu/openmpi/include/mpi.h
./usr/include/x86_64-linux-gnu/mpich/mpi.h

I've got similar results running the command in other Ubuntu 22 computers at out lab.

fermza avatar Jan 19 '23 14:01 fermza

What about mpicc -compile_info ? It should return something like gcc -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -I/usr/include/x86_64-linux-gnu/mpich -L/usr/lib/x86_64-linux-gnu -lmpich.

stephaneguindon avatar Jan 23 '23 10:01 stephaneguindon

Hi, with this I found something that might explain what's happening. So I run the command you suggested, but didn't work:

~ mpicc -compile_info
gcc: error: unrecognized command-line option ‘-compile_info’
gcc: fatal error: no input files
compilation terminated.

Indeed, doesn't seem to be a parameter:

~ mpicc --help
Usage: gcc [options] file...
Options:
  -pass-exit-codes         Exit with highest error code from a phase.
  --help                   Display this information.
  --target-help            Display target specific command line options.
  --help={common|optimizers|params|target|warnings|[^]{joined|separate|undocumented}}[,...].
                           Display specific types of command line options.
  (Use '-v --help' to display command line options of sub-processes).
  --version                Display compiler version information.
  -dumpspecs               Display all of the built in spec strings.
  -dumpversion             Display the version of the compiler.
  -dumpmachine             Display the compiler's target processor.
  -print-search-dirs       Display the directories in the compiler's search path.
  -print-libgcc-file-name  Display the name of the compiler's companion library.
  -print-file-name=<lib>   Display the full path to library <lib>.
  -print-prog-name=<prog>  Display the full path to compiler component <prog>.
  -print-multiarch         Display the target's normalized GNU triplet, used as
                           a component in the library path.
  -print-multi-directory   Display the root directory for versions of libgcc.
  -print-multi-lib         Display the mapping between command line options and
                           multiple library search directories.
  -print-multi-os-directory Display the relative path to OS libraries.
  -print-sysroot           Display the target libraries directory.
  -print-sysroot-headers-suffix Display the sysroot suffix used to find headers.
  -Wa,<options>            Pass comma-separated <options> on to the assembler.
  -Wp,<options>            Pass comma-separated <options> on to the preprocessor.
  -Wl,<options>            Pass comma-separated <options> on to the linker.
  -Xassembler <arg>        Pass <arg> on to the assembler.
  -Xpreprocessor <arg>     Pass <arg> on to the preprocessor.
  -Xlinker <arg>           Pass <arg> on to the linker.
  -save-temps              Do not delete intermediate files.
  -save-temps=<arg>        Do not delete intermediate files.
  -no-canonical-prefixes   Do not canonicalize paths when building relative
                           prefixes to other gcc components.
  -pipe                    Use pipes rather than intermediate files.
  -time                    Time the execution of each subprocess.
  -specs=<file>            Override built-in specs with the contents of <file>.
  -std=<standard>          Assume that the input sources are for <standard>.
  --sysroot=<directory>    Use <directory> as the root directory for headers
                           and libraries.
  -B <directory>           Add <directory> to the compiler's search paths.
  -v                       Display the programs invoked by the compiler.
  -###                     Like -v but options quoted and commands not executed.
  -E                       Preprocess only; do not compile, assemble or link.
  -S                       Compile only; do not assemble or link.
  -c                       Compile and assemble, but do not link.
  -o <file>                Place the output into <file>.
  -pie                     Create a dynamically linked position independent
                           executable.
  -shared                  Create a shared library.
  -x <language>            Specify the language of the following input files.
                           Permissible languages include: c c++ assembler none
                           'none' means revert to the default behavior of
                           guessing the language based on the file's extension.

Options starting with -g, -f, -m, -O, -W, or --param are automatically
 passed on to the various sub-processes invoked by gcc.  In order to pass
 other options on to these processes the -W<letter> options must be used.

For bug reporting instructions, please see:
<file:///usr/share/doc/gcc-11/README.Bugs>.

In my case, this is mpicc version:

~ mpicc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So I am now wondering if this version, which I assume is the default version installed in Ubuntu 22 upon upgrading, may be the reason of this issue with PhyML? Can you indicate which version of the compiler you've got installed? Maybe if I downgrade to an older mpicc version I can use PhyML again. In this sense, I checked in an older computer (with Ubuntu 20 and working PhyML) I have an older mpicc:

fer@fer ~/Desktop $ mpicc --version 
gcc (Ubuntu 8.4.0-3ubuntu2) 8.4.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I hope this can help. Thanks!

fermza avatar Jan 23 '23 18:01 fermza

I'm using the same version of gcc. My best guess at the moment is that you have both openmpi and mpich installed on your machine and that these two are conflicting. On my side, /bin/mpicc points to /usr/bin/mpicc.mpich. Could you please check that it is also the case for you?

stephaneguindon avatar Jan 24 '23 07:01 stephaneguindon

Hi Stephane, it's been a while and we are still unable to run Phyml in Ubuntu 22 (the same issue keeps popping up). I have tried several solutions in several computers. However, a few days ago I was reading something somewhere (sorry, I do not recall where), and I tried running with phyml-mpi in the command rather than simply phyml (not sure why I didn't try it before). Now the error I used to get all the time:

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[19216,1],2]
  Exit code:    1
--------------------------------------------------------------------------

did not popo up. It actually seems to start running, but now something new is happening:

~/Desktop/test phyml-mpi -i perk_2023_h08.phy -d aa -m WAG -a e --no_memory_check


. Running the analysis on 1 CPU..

. Command line: phyml-mpi -i perk_2023_h08.phy -d aa -m WAG -a e --no_memory_check 





  ////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.//////////////////////////////////////////

        . Sequence filename:				 perk_2023_h08.phy
        . Data type:					 aa
        . Alphabet size:				 20
        . Sequence format:				 interleaved
        . Number of data sets:				 1
        . Nb of bootstrapped data sets:			 0
        . Compute approximate likelihood ratio test:	 yes (aBayes branch supports)
        . Model name:					 WAG
        . Proportion of invariable sites:		 0.000000
        . RAS model:					 discrete Gamma
        . Number of subst. rate catgs:			 4
        . Gamma distribution parameter:			 estimated
        . 'Middle' of each rate class:			 mean
        . Amino acid equilibrium frequencies:		 model
        . Optimise tree topology:			 yes
        . Starting tree:				 BioNJ
        . Add random input tree:			 no
        . Optimise branch lengths:			 yes
        . Minimum length of an edge:			 1e-08
        . Optimise substitution model parameters:	 yes
        . Run ID:					 none
        . Random seed:					 1702560731
        . Subtree patterns aliasing:			 no
        . Version:					 3.3.3:3.3.20211231-1
        . Byte alignment:				 1
        . AVX enabled:					 no
        . SSE enabled:					 no

  ////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.//////////////////////////////////////////



. 462 patterns found (out of a total of 507 sites). 

. 72 sites without polymorphism (14.20%).


. Computing pairwise distances...

. Building BioNJ tree...

. WARNING: this analysis will use at least 131 MB of memory space...


. Score of initial tree: -34205.08
. -34202.259274 -- -36858.581928
. Edge: 187
. is_mixt_tree: 0
. Err. in file 'optimiz.c' (line 875)
. PhyML finished prematurely.

Besides the error, it also looks like it's using only one core (usually by default Phyml was using half of the threads available, this is a Ryzen 7 computer with 8 cores). I am using version PhyML 3.3.3:3.3.20211231-1. Any suggestions on how to fix these issues? I feel I'm close to be able to run Phyml locally again, but what I have found online about this new error didn't help me.

fermza avatar Dec 14 '23 13:12 fermza

Hi there. From the command-line you're using here, it looks like you do not need to use the MPI version of PhyML (as you're not running any bootstrap analysis). I'd therefore suggest using the "standard" PhyML executable and post the error message returned, if any.

stephaneguindon avatar Dec 14 '23 16:12 stephaneguindon

The problem is that running with phyml command alone I still get the error that originated this whole thread (which I haven't been able to fix). Here's an example running with phyml:

~/Desktop/test phyml -i perk_2023_h08.phy -d aa -m WAG -a e --no_memory_check 


. Running the analysis on 6 CPUs..

. Command line: /usr/lib/phyml/bin/phyml-mpi -i perk_2023_h08.phy -d aa -m WAG -a e --no_memory_check 





  ////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.//////////////////////////////////////////

        . Sequence filename:				 perk_2023_h08.phy
        . Data type:					 aa
        . Alphabet size:				 20
        . Sequence format:				 interleaved
        . Number of data sets:				 1
        . Nb of bootstrapped data sets:			 0
        . Compute approximate likelihood ratio test:	 yes (aBayes branch supports)
        . Model name:					 WAG
        . Proportion of invariable sites:		 0.000000
        . RAS model:					 discrete Gamma
        . Number of subst. rate catgs:			 4
        . Gamma distribution parameter:			 estimated
        . 'Middle' of each rate class:			 mean
        . Amino acid equilibrium frequencies:		 model
        . Optimise tree topology:			 yes
        . Starting tree:				 BioNJ
        . Add random input tree:			 no
        . Optimise branch lengths:			 yes
        . Minimum length of an edge:			 1e-08
        . Optimise substitution model parameters:	 yes
        . Run ID:					 none
        . Random seed:					 1702574801
        . Subtree patterns aliasing:			 no
        . Version:					 3.3.3:3.3.20211231-1
        . Byte alignment:				 1
        . AVX enabled:					 no
        . SSE enabled:					 no

  ////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.//////////////////////////////////////////



. 462 patterns found (out of a total of 507 sites). 

. 72 sites without polymorphism (14.20%).


. Computing pairwise distances...

. Building BioNJ tree...

. WARNING: this analysis will use at least 131 MB of memory space...
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------


. Score of initial tree: -34205.08--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[45724,1],5]
  Exit code:    1
--------------------------------------------------------------------------

Note the bottom part, which is the same error shown at the beginning of the thread. Thanks for you reply!

Regards

fermza avatar Dec 14 '23 17:12 fermza

You probably need to talk to your sysadmin here. The command 'phyml' points to 'phyml-mpi', which is wrong. It should point to a binary called 'phyml' (instead of 'phyml-mpi')

stephaneguindon avatar Dec 15 '23 09:12 stephaneguindon

Hi @stephaneguindon, I hope you can help me.

I'm using

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.3 LTS
Release:	22.04
Codename:	jammy

and I am having the same issue as @fermza.

$phyml --version                  
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified


. Running the analysis on 64 CPUs..
. This is PhyML version 3.3.3:3.3.20211231-1.

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[30668,1],11]
  Exit code:    1
--------------------------------------------------------------------------

I've been following the different steps that you've mentioned above:

sudo apt-get purge mpich
sudo apt-get install mpich

then I've checked if /bin/mpicc points to /usr/bin/mpicc.mpich and I found out that it doesn't, so I've changed it doing:

$ readlink -f /bin/mpicc
/usr/bin/opal_wrapper
$ readlink -f /usr/bin/mpicc.mpich 
/usr/bin/mpicc.mpich
$ sudo ln -sf /usr/bin/mpicc.mpich /bin/mpicc
$ readlink -f /bin/mpicc
/usr/bin/mpicc.mpich

After that, I still have the same problem. When using phyml with not many sequences it works properly, but when trying to use it with a larger file this error appears

Just in case, my mpicc version is 12.3.0

Thank you very much for your attention,

centrebiodiversitat avatar Dec 21 '23 11:12 centrebiodiversitat