netcdf-fortran icon indicating copy to clipboard operation
netcdf-fortran copied to clipboard

Possible parallel build failure when using CMake

Open zbeekman opened this issue 7 years ago • 2 comments

EDITED 2017-07-20 13:45 EDT

I noticed that a lot of the tests use a test module from a common source file. For modules sometimes CMake will step on the .mod file or timestamp file when building multiple Fortran targets that use the same module in parallel. I have tested a parallel build and confirmed that this issue exists:

[ 68%] Building Fortran object nf03_test/CMakeFiles/f03tst_vars.dir/module_tests.F90.o
[ 69%] Linking Fortran executable f03tst_vars3
[ 70%] Building Fortran object nf03_test/CMakeFiles/f03tst_vars2.dir/f03tst_vars2.F.o
[ 70%] Building Fortran object nf03_test/CMakeFiles/f03tst_vars2.dir/f03lib_f_interfaces.f90.o
/packages/gcc/6.4/bin/gfortran   -O2 -g -DNDEBUG CMakeFiles/f03tst_vars3.dir/module_tests.F90.o CMakeFiles/f03tst_vars3.dir/f03tst_vars3.F.o CMakeFiles/f03tst_vars3.dir/f03lib_f_interfaces.f90.o CMakeFiles/f03tst_vars3.dir/f03lib.c.o CMakeFiles/f03tst_vars3.dir/handle_err.F.o  -o f03tst_vars3  -L/home/users/ibeekman/netcdf-fortran-4.4.4/fortran  -L/home/users/ibeekman/netcdf-fortran-4.4.4/libsrc -Wl,-rpath,/home/users/ibeekman/netcdf-fortran-4.4.4/fortran:/home/users/ibeekman/netcdf-fortran-4.4.4/libsrc:/home/users/ibeekman/netcdf-fortran-4.4.4/build/fortran:/usr/local/packages/netcdf/4.4.1.1_gcc-6.4/lib64 ../fortran/libnetcdff.so.6.1.1 /usr/local/packages/netcdf/4.4.1.1_gcc-6.4/lib64/libnetcdf.so.11.4.0 /packages/hdf5/1.8.19/lib/libhdf5_hl.so /packages/hdf5/1.8.19/lib/libhdf5.so /packages/hdf5/1.8.19/lib/libsz.so /packages/hdf5/1.8.19/lib/libz.so /usr/lib64/libdl.so /usr/lib64/libm.so /usr/lib64/libcurl.so
[ 71%] Linking Fortran executable f03tst_types3
/packages/gcc/6.4/bin/gfortran   -O2 -g -DNDEBUG CMakeFiles/f03tst_types3.dir/module_tests.F90.o CMakeFiles/f03tst_types3.dir/f03tst_types3.F.o CMakeFiles/f03tst_types3.dir/f03lib_f_interfaces.f90.o CMakeFiles/f03tst_types3.dir/f03lib.c.o CMakeFiles/f03tst_types3.dir/handle_err.F.o  -o f03tst_types3  -L/home/users/ibeekman/netcdf-fortran-4.4.4/fortran  -L/home/users/ibeekman/netcdf-fortran-4.4.4/libsrc -Wl,-rpath,/home/users/ibeekman/netcdf-fortran-4.4.4/fortran:/home/users/ibeekman/netcdf-fortran-4.4.4/libsrc:/home/users/ibeekman/netcdf-fortran-4.4.4/build/fortran:/usr/local/packages/netcdf/4.4.1.1_gcc-6.4/lib64 ../fortran/libnetcdff.so.6.1.1 /usr/local/packages/netcdf/4.4.1.1_gcc-6.4/lib64/libnetcdf.so.11.4.0 /packages/hdf5/1.8.19/lib/libhdf5_hl.so /packages/hdf5/1.8.19/lib/libhdf5.so /packages/hdf5/1.8.19/lib/libsz.so /packages/hdf5/1.8.19/lib/libz.so /usr/lib64/libdl.so /usr/lib64/libm.so /usr/lib64/libcurl.so
f951: Fatal Error: Can't rename module file ‘tests.mod0’ to ‘tests.mod’: No such file or directory
compilation terminated.
make[2]: *** [nf03_test/CMakeFiles/f03tst_vars5.dir/module_tests.F90.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[ 72%] Building C object nf03_test/CMakeFiles/f03tst_vars2.dir/f03lib.c.o
[ 73%] Building C object nf03_test/CMakeFiles/f03tst_vars6.dir/f03lib.c.o
[ 73%] Building Fortran object nf03_test/CMakeFiles/f03tst_vars6.dir/f03lib_f_interfaces.f90.o
/home/users/ibeekman/netcdf-fortran-4.4.4/nf03_test/test03_get.F:375:11:

         use tests
           1
Fatal Error: Can't open module file ‘tests.mod’ for reading at (1): No such file or directory
compilation terminated.
make[2]: *** [nf03_test/CMakeFiles/nf03_test.dir/test03_get.F.o] Error 1
make[1]: *** [nf03_test/CMakeFiles/nf03_test.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 73%] Building Fortran object nf03_test/CMakeFiles/f03tst_vars6.dir/handle_err.F.o
[ 73%] Building Fortran object nf03_test/CMakeFiles/f03tst_types.dir/f03tst_types.F.o
make[1]: *** [nf03_test/CMakeFiles/f03tst_vars5.dir/all] Error 2
f951: Fatal Error: Can't rename module file ‘tests.mod0’ to ‘tests.mod’: No such file or directory
compilation terminated.

The best way around this is to use cmake object libraries for any source files containing modules. This means that the module file gets compiled once and only one .mod file is produced (and just one .o file if the module contains executable code) and then CMake will pass the object file(s) (as the "object library") to the compiler.

I will try to work up a PR at some point in the future, posting here first to let you know that I've started work. If someone else beats me to it, even better.

zbeekman avatar Jul 19 '17 15:07 zbeekman

~~If you have an unconfirmed label, you should def. apply it here... I'll see if I can investigate in a bit... since closing #63 I suspect I'll be rebuilding netCDF-Fortran in my not-so-distant future.~~

EDIT: Rebuilt netCDF-Fortran and confirmed existence of bug, see above.

BTW, is there a compatibility mapping between netCDF-C and netCDF-Fortran? There is obviously no 1-to-1 mapping in version number since C is at 4.4.1.1 and Fortran is at 4.4.4. better/more/easier-to-find documentation would be great... a tall order I know, I can barely keep up with crappy minimal documentation on my F/OSS projects.

zbeekman avatar Jul 19 '17 16:07 zbeekman

I'll see what I can do; the object libraries would be easy enough for me to do, but we are resource-constrained and pull requests are always welcome. There isn't a compatibility mapping, although we endeavor to keep things backwards compatible. We kind of dropped the ball with the previous version of netCDF fortran, but moving forward we will definitely call out/highlight minimum required version of netCDF-C.

WardF avatar Jul 19 '17 16:07 WardF