netcdf-fortran icon indicating copy to clipboard operation
netcdf-fortran copied to clipboard

Fix fortran

Open skosukhin opened this issue 5 years ago • 14 comments
trafficstars

This PR offers the following:

  1. Support for Fortran compilers that generate module files with names in UPPERCASE.
  2. Support for Fortran compilers that generate module files with extensions other than .mod.
  3. Parallel builds in the ./fortran subdirectory.
  4. No need to compile anything to create dist tarballs: currently, Fortran modules are listed in BUILT_SOURCES, which forces compilation even if we want the tarballs only, i.e. ./configure && make dist.
  5. Fix for NAG: currently, the shared version of the library built with NAG does not contain any symbols from the convenience libraries, which obviously make libnetcdff.so unusable. A simplified axample of why this happens is below.
    $ cat ./banana.f90
    subroutine banana
    end subroutine
    $ cat ./main.f90
    subroutine libmain
    end subroutine
    $ nagfor -PIC -c banana.f90
    $ ar cr libbanana.a banana.o
    $ nagfor -PIC -c main.f90
    $ nagfor -Wl,-shared -o libmain.so -Wl,-Wl,,--whole-archive libbanana.a -Wl,-Wl,,--no-whole-archive main.o
    $ nm ./libmain.so | grep banana || echo "FAIL"
    FAIL
    
    The example above shows what happens when a static library libbanana.a is injected into a shared library libmain.so with NAG. Basically the resulting shared library does not contain symbols from the static one. The same happens with the convenience libraries in Autotools. This happens because NAG (at least version 6.2) reshuffles linking arguments:
    $ nagfor -dryrun -Wl,-shared -o libmain.so -Wl,-Wl,,--whole-archive libbanana.a -Wl,-Wl,,--no-whole-archive main.o
    /usr/bin/gcc -m64 -o libmain.so ... libbanana.a main.o ... -lm -shared -Wl,--whole-archive -Wl,--no-whole-archive
    
    As you can see, there is nothing between the arguments -Wl,--whole-archive and -Wl,--no-whole-archive, which is the problem. There are two ways to fix this. The first one is to patch libtool, so it would prepend -Wl, flag to the names of the convenience libraries:
    $ nagfor -Wl,-shared -o libmain.so -Wl,-Wl,,--whole-archive -Wl,libbanana.a -Wl,-Wl,,--no-whole-archive main.o
    $ nm ./libmain.so | grep banana || echo "FAIL"
    0000000000044d08 T banana_
    
    Another way, which is implemented in this PR, is not to use the convenience libraries.
  6. Remove the execute permissions from files that don't need them and extend the list in .gitignore.

skosukhin avatar Feb 21 '20 15:02 skosukhin

Which fortran compilers generate mod files with uppercase?

I thought we had left that behind...

edwardhartnett avatar Feb 26 '20 22:02 edwardhartnett

There is at least Cray compiler that creates module files in uppercase. As far as I remember, there is a flag that tells the compiler to create module files in lowercase but still. Let me know if you are not interested in supporting that and I will remove the first two bullet points from the list in my first message. Personally, I am much more interested in the fix for NAG.

skosukhin avatar Feb 27 '20 10:02 skosukhin

May I suggest that you take this PR down and lets work through the issues one by one.

What you are calling a NAG compiler bug seems to be a libtool problem, right? That is, what is not working for you in this case is libtool. Are you sure you are using the latest libtool? Have others noticed this bug in libtool?

You are proposing a lot of pretty dramatic changes to the build system, but it's not clear they are needed. If we can get your NAG compiler working with libtool, then that solves your problems without any changes in the build system.

edwardhartnett avatar Mar 20 '20 12:03 edwardhartnett

I am using Libtool 2.4.6. As far as I understand, it's the latest one and it's 5 years old. The last time master branch of Libtool was updated a year ago. Therefore we can hardly expect that the problem will be fixed in the foreseeable future. The patch enabling convenience libraries with NAG looks like:

--- a/m4/libtool.m4
+++ b/m4/libtool.m4
@@ -5204,6 +5204,8 @@ _LT_EOF
          _LT_TAGVAR(whole_archive_flag_spec, $1)=
          tmp_sharedflag='--shared' ;;
         nagfor*)                        # NAGFOR 5.3
+          _LT_TAGVAR(whole_archive_flag_spec, $1)='$wl--whole-archive`for conv in $convenience\"\"; do test  -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive'
+          _LT_TAGVAR(compiler_needs_object, $1)=yes
           tmp_sharedflag='-Wl,-shared' ;;
        xl[[cC]]* | bgxl[[cC]]* | mpixl[[cC]]*) # IBM XL C 8.0 on PPC (deal with xlf below)
          tmp_sharedflag='-qmkshrobj'

If you take a look at libtool.m4, you will see that it's identical to what is required for several other compilers.

Since Libtool is not maintained regularly, it became common practice for many packages to patch Libtool before generating the configure scripts. This might be tricky though (e.g. see MPICH or OpenMPI). If you decide to go for this option, you could add the following script and call it instead of autoreconf -fvi:

#!/bin/sh
autoreconf -fvi || exit $?

# The program 'patch' exits with exitcode=1 if the patch has already been applied.
# Consider this a normal scenario:
patch --forward --no-backup-if-mismatch -p1 -r - -i libtool_patches/libtool.m4.nag_convenience.patch
exitcode=$?; test $exitcode -ne 0 && test $exitcode -ne 1 && exit $exitcode

# Reset libtool.m4 timestamps to avoid confusing make:
touch -r m4/ltversion.m4 m4/libtool.m4

As far as I understand, the convenience libraries were introduced as an attempt to handle parallel builds in fortran directory: 79a8a6f85d92b1d31f487517f16820d65c550fd8. Unfortunately, it didn't help and now there is still no parallel building but an extra problem. The right way (maybe not the only one though) to solve the problem with parallel builds is to declare dependencies between objects and Fortran module files. This is what is done in this PR. I don't like PRs introducing too many changes at once too but solving all issues one by one requires too much overhead in this particular case.

skosukhin avatar Mar 20 '20 13:03 skosukhin

Convenience libraries were not introduced for parallel builds, they have always been used to assemble the libraries in C and Fortran.

I agree WRT listing dependencies to mod files in the makefile, in order that parallel builds will work. I will take a look at that next.

edwardhartnett avatar Mar 20 '20 14:03 edwardhartnett

I am sorry, I was confused with the message of the commit that introduced the convenience libraries and I don't see that they had been used before.

skosukhin avatar Mar 20 '20 14:03 skosukhin

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Apr 10 '20 09:04 CLAassistant

Thanks for signing the CLA, I'm wading through the PR's now. Thanks!

WardF avatar Apr 13 '20 23:04 WardF

This PR should not be merged, it should be taken down.

The principle problem is the (re-)inclusion of handling for fortran compilers which convert mod names to upper-case. This is something some old fortran compilers used to do, but don't any more. We took this out of netcdf-fortran several releases ago, and have had no complaints.

Similarly, the GFDL FMS system, which is a large fortran modelling system that has been ported to many platforms, does not include handling for upper-case mod names, and it is known to build on every HPC platform they have tried, including Cray.

So I believe handling of upper-case mod names can be left out of the netcdf-fortran build system. That's good, because it adds a significant amount of code and complexity, and if we can do without it, our build system becomes simpler and easier to maintain.

(However, if someone definately comes up with a real requirement for this on an actual existing HPC system, that would be a decisive reason to handle upper-case mod names.)

Almost all the other improvements from this PR are being incorporated into not only netcdf-fortran, but also the FMS system, and result in significant improvement. The switch to using .lo from .$(OBJEXT) cleared up some persistent problems for the FMS build.

Especially in the case of the FMS build, there are many other programmers involved so the changes have to be introduced more slowly and each individually justified. I will be referring to this PR for some weeks to come...

So thanks @skosukhin !

edwardhartnett avatar Apr 14 '20 14:04 edwardhartnett

Cray Fortran 9.0.2 generates module files in uppercase by default. So there are HPC systems where the problem is relevant. This can be solved by running the configure script with FCFLAGS=-ef. Another option is not to build NetCDF-Fortran there at all but to use the libraries provided by the vendor (i.e. module load cray-netcdf). But as I mentioned, fixing this is not the main goal of this PR. The main issue is that Libtool does not handle the convenience libraries properly when building with NAG. There are two ways to solve this:

  • patch Libtool files right after autoreconf like it's done in OpenMPI (e.g. see https://github.com/open-mpi/ompi/pull/6378) or in Parallel-NetCDF (e.g. see https://github.com/Parallel-NetCDF/PnetCDF/pull/59);
  • do not use convenience libraries because they do not seem to be really needed.

I am not saying that this PR should be merged: use the provided information the way you like (restructure, merge step-by-step, etc.). I just want to make sure that the problem with NAG is not ignored.

skosukhin avatar Apr 14 '20 16:04 skosukhin

Yes, the convenience library issue is next on the chopping block.

If we can eliminate them, let's take a look at that.

On Tue, Apr 14, 2020 at 10:15 AM Sergey Kosukhin [email protected] wrote:

Cray Fortran 9.0.2 generates module files in uppercase by default. So there are HPC systems where the problem is relevant. This can be solved by running the configure script with FCFLAGS=-ef. Another option is not to build NetCDF-Fortran there at all but to use the libraries provided by the vendor (i.e. module load cray-netcdf). But as I mentioned, fixing this is not the main goal of this PR. The main issue is that Libtool does not handle the convenience libraries properly when building with NAG. There are two ways to solve this:

I am not saying that this PR should be merged: use the provided information the way you like (restructure, merge step-by-step, etc.). I just want to make sure that the problem with NAG is not ignored.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf-fortran/pull/221#issuecomment-613537157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJIOMMHYGKSAFTJMKPX2IF3RMSDYVANCNFSM4KZFIXKA .

edwardhartnett avatar Apr 14 '20 17:04 edwardhartnett

OK @skosukhin I have looked what you have done with convenience libraries.

However, there is one fly in the ointment. I would like to separate (as they were in olden times) the F90 API code from the F77 code. See #159

In that case, we would once again need convenience libraries. We would need a third directory, similar to liblib in the C library, which assembles the final fortran library. And for that to work, we need at least two convenience libraries, one for F77 and one for F90.

So perhaps patching libtool is the answer here...

edwardhartnett avatar Apr 15 '20 12:04 edwardhartnett

Since I just ran into the uppercase issue with Cray ftn 10 and had to rediscover that I needed to add -ef to the flags: please consider to adopt some automatic way to deal with this.

Regarding the needed patch to libtool.m4: in my projects we patch libtool.m4 macros with the m4_bpatsubst function like you can inspect here during the autoconf macro expansion, i.e there's no need to mess around with files from libtool or the autotools artifacts after running autoconf and the patch can be cleanly restricted to the 2.4.6 libtool macros.

On the other hand, because we try to fully support all sorts of behaviours of the Nvidia and NAG compilers that only emerged after the libtool 2.4.6 release time, the whole is a bit hard to read without an editor that properly supports autoconf syntax.

tjahns avatar Mar 11 '21 00:03 tjahns

Curious if the recent changes/refactoring have fixed this, or if we need to address this still for the v4.6.0 release.

WardF avatar Jul 07 '22 21:07 WardF