amuse icon indicating copy to clipboard operation
amuse copied to clipboard

Problem building AMUSE from source

Open Spijkerberg opened this issue 2 years ago • 21 comments

Hello,

I am trying to build AMUSE from source on a sterrewacht computer. I am running into issues with certain packages not being built.

Here is the buildlog of the building process.

Thanks in advance, Menno

Spijkerberg avatar Nov 24 '23 10:11 Spijkerberg

Could you paste the output of module list for the shell in which you're building?

LourensVeen avatar Nov 24 '23 12:11 LourensVeen

Looks like Steven is onto something:

make: Entering directory '/data2/vandereerden/amuse/src/amuse/community/capreole'
make -C build_mpi amuse_interface_mpi  VPATH=../src F90FLAGS1="-g -O2 -DNOMPI -fPIC  " FC="/usr/bin/gfortran -fallow-argument-mismatch" MPIFC="/usr/bin/gfortran -fallow-argument-mismatch"
make[1]: Entering directory '/data2/vandereerden/amuse/src/amuse/community/capreole/build_mpi'
/usr/bin/gfortran -fallow-argument-mismatch -c -g -O2 -DNOMPI -fPIC   -DMPI  -I/data2/vandereerden/amuse/lib/stopcond ../src/amuse_mpi.F90
../src/amuse_mpi.F90:54:0:

   54 |   include 'mpif.h'
      | 
Fatal Error: Cannot open included file ‘mpif.h’
compilation terminated.

That's definitely not right. The question is why this is happening. I should be getting access to these machines, so then I can experiment a little.

LourensVeen avatar Nov 24 '23 12:11 LourensVeen

It seems to use gfortran, I would expect mpifortran needs to be used here instead. The argument MPIFC="/usr/bin/gfortran -fallow-argument-mismatch" is probably causing this.

rieder avatar Nov 24 '23 13:11 rieder

MPIFC should not be set to gfortran, keep it at mpifortran or undefined. You can set FC to gfortran -fallow-argument-mismatch if this is important, but redefining an MPI compiler to a non-MPI one is probably what's causing the issue.

rieder avatar Nov 24 '23 13:11 rieder

... thinking about it further, I guess this may follow from AMUSE being configured without MPI in the first place. So a first step would be to re-run configure in the AMUSE dir, and checking config.mk for the definition of MPIFC there.

rieder avatar Nov 24 '23 13:11 rieder

The output of module load is: Currently Loaded Modules:

  1. localhosts
  2. GCCcore/12.3.0
  3. zlib/1.2.13-GCCcore-12.3.0
  4. binutils/2.40-GCCcore-12.3.0
  5. GCC/12.3.0
  6. numactl/2.0.16-GCCcore-12.3.0
  7. XZ/5.4.2-GCCcore-12.3.0
  8. libxml2/2.11.4-GCCcore-12.3.0
  9. libpciaccess/0.17-GCCcore-12.3.0
  10. hwloc/2.9.1-GCCcore-12.3.0
  11. OpenSSL/1.1
  12. libevent/2.1.12-GCCcore-12.3.0
  13. UCX/1.14.1-GCCcore-12.3.0
  14. libfabric/1.18.0-GCCcore-12.3.0
  15. PMIx/4.2.4-GCCcore-12.3.0
  16. UCC/1.2.0-GCCcore-12.3.0
  17. OpenMPI/4.1.5-GCC-12.3.0

This is the result of loading the modules 'AMUSE' (AMUSE/2023.5.1) and 'OpenMPI'. In that order.

Spijkerberg avatar Nov 24 '23 13:11 Spijkerberg

I'd like to see the config.mk file (in the AMUSE root dir) of the sterrewacht installation of AMUSE, can you find this @Spijkerberg?

rieder avatar Nov 24 '23 13:11 rieder

I found the file in the AMUSE root dir on the sterrewacht machine. I turned it into a txt file to share it.

Spijkerberg avatar Nov 24 '23 14:11 Spijkerberg

Thanks. This shows AMUSE was configured without MPI support, so no wonder stuff that requires MPI breaks... This will require fixing on the module level.

rieder avatar Nov 24 '23 15:11 rieder

Looking at the config.mk that was produced from my own installation from source, I see that MPI_ENABLED=no, while I did load the MPI module before installing. I could try to install AMUSE from source again, but I do not know if that would help.

Spijkerberg avatar Nov 24 '23 15:11 Spijkerberg

You will also need to install mpi4py (pip install mpi4py). If this is not detected, configure will set MPI_ENABLED to no.

rieder avatar Nov 24 '23 15:11 rieder

Looking at the packages enabled in my environment I see that mpi4py is already installed. Here is the output of pip freeze: -e git+https://github.com/amusecode/amuse.git@72c4a3c32c21e48f3a823af9f742c7de2684138b#egg=amuse_devel docutils==0.20.1 h5py==3.10.0 iniconfig==2.0.0 mpi4py==3.1.5 numpy==1.26.2 packaging==23.2 pluggy==1.3.0 pytest==7.4.3 setuptools-scm==8.0.4 typing_extensions==4.8.0

Spijkerberg avatar Nov 24 '23 15:11 Spijkerberg

If you re-run configure, does that change the config.mk file?

rieder avatar Nov 24 '23 15:11 rieder

This did change the config.mk file. I can see that MPI_ENABLED=yes is set correctly now. I will try rebuilding AMUSE to see what the result is.

Spijkerberg avatar Nov 24 '23 15:11 Spijkerberg

It seems that some more of the community codes have been built, but there are still some errors when building. I have provided the buildlogs again for you to inspect.

Testing to see if the community codes work results in UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY .

Spijkerberg avatar Nov 24 '23 16:11 Spijkerberg

OPAL is OpenMPI's utility library. This error sounds like some kind of data format mismatch, which suggests there are different versions of MPI in use. Possibly mpi4py got compiled against a different version of MPI than AMUSE? Or the version you have active when running your script doesn't match the one that was loaded when mpi4py was installed and/or when AMUSE was built?

LourensVeen avatar Nov 27 '23 09:11 LourensVeen

I had a go at building AMUSE on a Sterrenwacht machine. Progress so far:

(amuse-env) <user>@<host>:~/amuse$ python setup.py develop_build
Illegal instruction (core dumped)

Looks like some kind of numpy build issue. Time to start digging...

LourensVeen avatar Dec 06 '23 11:12 LourensVeen

If you do module load AMUSE you should now get the correct prerequisites, there was an issue with mpi4py installing the wrong mpi...

rieder avatar Dec 06 '23 12:12 rieder

OPAL is OpenMPI's utility library. This error sounds like some kind of data format mismatch, which suggests there are different versions of MPI in use. Possibly mpi4py got compiled against a different version of MPI than AMUSE? Or the version you have active when running your script doesn't match the one that was loaded when mpi4py was installed and/or when AMUSE was built?

this was exactly the issue. mpi4py was installed in an incorrect way, built against the wrong (conda) openmpi library - which then clashed with the correct one. This also caused the wrong configuration of the AMUSE module.

rieder avatar Dec 06 '23 12:12 rieder

What a mess. Actually, when I try to module load AMUSE I get this:

Lmod has detected the following error:  Unable to load module because of error when evaluating modulefile:
    /easybuild/easybuild/el8_8/modules/all/AMUSE/2023.10.0.lua: Empty or non-existent file
    Please check the modulefile and especially if there is a line number specified in the above message  
While processing the following module(s):
   Module fullname  Module Filename
   ---------------  ---------------
   AMUSE/2023.10.0  /easybuild/easybuild/el8_8/modules/all/AMUSE/2023.10.0.lua

That lua script exists, but has permissions 600, so it can't read it...

Seems like I should put a working EasyBuild configuration for AMUSE on my to-do list, after Conda packages and a new build system.

LourensVeen avatar Dec 06 '23 12:12 LourensVeen

I think the script may be getting updated at the moment, which could account for that weirdness. But yes, an “official” easybuild module might be a good idea.

rieder avatar Dec 06 '23 12:12 rieder