CMake can't find `MPI_Comm_f2c` resulting in unsafe cast in `nc_create_par_fortran`
I'm building netcdf 4.9.2 via spack-stack although most of what I'm describing can also be found in the development branch of netcdf-c. Shortly; there's two related issues I'm seeing.
- CMake is failing to find MPI_Comm_f2c. It subsequently compiles with "HAVE_MPI_COMM_F2C" undefined resulting in changes to the function nc_create_par_fortran.
- When
HAVE_MPI_COMM_F2Cis unset, the function nc_create_par_fortran makes an unsafe cast of a fortran int to a c pointer resulting in downstream failures for netcdf-fortran due to the invalid MPI communicator.
Issue 1: CMake is failing to find MPI_Comm_f2c
I've attached the full configure logs here: spack-stage-netcdf-c-4.9.2-build-out.txt
But here's an bit of the parts I assume are relevant
==> netcdf-c: Executing phase: 'cmake'
==> [2025-11-05-23:18:43.390306] '/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/cmake-3.28.6-a57w6qu/bin/cmake' '-G' 'Unix Makefiles' '-DCMAKE_INSTALL_PREFIX:STRING=/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/netcdf-c-4.9.2-nh6sdcg' '-DCMAKE_INSTALL_RPATH_USE_LINK_PATH:BOOL=ON' '-DCMAKE_INSTALL_RPATH:STRING=/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/netcdf-c-4.9.2-nh6sdcg/lib;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/netcdf-c-4.9.2-nh6sdcg/lib64' '-DCMAKE_PREFIX_PATH:STRING=/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/bzip2-1.0.8-46rroqk;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/c-blosc-1.21.6-m3q3sqf;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/cmake-3.28.6-a57w6qu;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/none/none/compiler-wrapper-1.0-bg47rrw;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/curl-8.11.1-66ndndg;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/gmake-4.4.1-yko4ce7;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/hdf5-1.14.3-2r5r466;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/m4-1.4.20-oabrdro;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/lz4-1.10.0-vdupftv;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/snappy-1.2.1-kezn235;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/zstd-1.5.7-nzcf6gn;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/nghttp2-1.65.0-xc3jjsm;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/openmpi-5.0.5-7o5wh3j;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/numactl-2.0.18-jbxjuwx;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/pmix-5.0.5-cb2qfdl;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/valgrind-3.24.0-fkxp5ig;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/hwloc-2.11.1-4rdxwte;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/libevent-2.1.12-yqtmri6;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/libpciaccess-0.17-6fdba6x;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/libxml2-2.13.5-vdoyyge;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/ncurses-6.5-sqql5m5;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/openssl-3.4.1-y6uqhmz;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/libiconv-1.18-6ufuaxt;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/xz-5.6.3-blzpk3x;/home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/zlib-1.3.1-tu7vabv' '-DCMAKE_BUILD_TYPE:STRING=Debug' '-DCMAKE_VERBOSE_MAKEFILE:BOOL=ON' '-DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF' '-DCMAKE_POLICY_DEFAULT_CMP0090:STRING=NEW' '-DCMAKE_FIND_USE_PACKAGE_REGISTRY:BOOL=OFF' '-DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON' '-DBUILD_SHARED_LIBS:BOOL=ON' '-DENABLE_BYTERANGE:BOOL=OFF' '-DBUILD_UTILITIES:BOOL=ON' '-DENABLE_NETCDF_4:BOOL=ON' '-DENABLE_DAP:BOOL=ON' '-DENABLE_HDF4:BOOL=OFF' '-DENABLE_PARALLEL_TESTS:BOOL=OFF' '-DENABLE_FSYNC:BOOL=OFF' '-DENABLE_LARGE_FILE_SUPPORT:BOOL=ON' '-DNETCDF_ENABLE_LOGGING:BOOL=OFF' '-DENABLE_DYNAMIC_LOADING:BOOL=ON' '-DNC_FIND_SHARED_LIBS:BOOL=ON' '/tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-netcdf-c-4.9.2-nh6sdcggap76245atvuf5pgx7jhg3gs5/spack-src'
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/ubuntu/spack-stack-eap/envs/unified-gcc/install/none/none/compiler-wrapper-1.0-bg47rrw/libexec/spack/gcc/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/ubuntu/spack-stack-eap/envs/unified-gcc/install/none/none/compiler-wrapper-1.0-bg47rrw/libexec/spack/gcc/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
...
-- Found MPI_C: /home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/openmpi-5.0.5-7o5wh3j/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /home/ubuntu/spack-stack-eap/envs/unified-gcc/install/gcc/11.4.0/openmpi-5.0.5-7o5wh3j/lib/libmpi.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
...
-- Looking for MPI_Comm_f2c
-- Looking for MPI_Comm_f2c - not found
-- Looking for MPI_Info_f2c
-- Looking for MPI_Info_f2c - not found
Also, I checked that MPI_Comm_f2c is present in my mpi install and it definitely is.
grep -r "MPI_Comm_f2c" /..truncated../openmpi-5.0.5-7o5wh3j/include
/..truncated../openmpi-5.0.5-7o5wh3j/include/mpi.h:OMPI_DECLSPEC MPI_Comm MPI_Comm_f2c(MPI_Fint comm)
For reasons unclear to me, the call CHECK_FUNCTION_EXISTS(MPI_Comm_f2c HAVE_MPI_COMM_F2C) in CMakeLists.txt is simply failing to detect the function.
Issue 2: Function nc_create_par_fortran does an unsafe int->ptr cast
The function nc_create_par_fortran is sensitive to the definition of HAVE_MPI_COMM_F2C and, when this is undefined, rather than calling MPI_Comm_f2c it will perform an unsafe cast of a fortran integer to a pointer. I get the following compiler warning due to this operation
/tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-netcdf-c-4.9.2-nh6sdcggap76245atvuf5pgx7jhg3gs5/spack-src/libdispatch/dparallel.c: In function 'nc_create_par_fortran':
/tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-netcdf-c-4.9.2-nh6sdcggap76245atvuf5pgx7jhg3gs5/spack-src/libdispatch/dparallel.c:457:14: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
457 | comm_c = (MPI_Comm)comm;
|
I'm not a MPI expert and perhaps there's use cases here that I'm not considering, but this whole thing seems like a bit of a tripwire hidden inside netcdf-c and it took me quite a bit of time to fully tease out the issues here. Any assistance would be appreciated.
How did this come to my attention?
I'm building our new spack-stack environment and running jedi-ctests. When some of the tests authored in fortran try to write out debug data, they get this failure
0x000015554e4da4b2 in ompi_comm_invalid (comm=0x3) at ../../../ompi/communicator/communicator.h:493
In the full stack trace I noted this interesting bit where an integer "3" is is converted to an aparent pointer when moving from the fortran implementation to the c implementation
#11 0x0000155552066347 in nc_create_par (path=... cmode=12288, comm=0x3, info=0x0 ... at /..trunc../libdispatch/dparallel.c:131
#12 0x0000155552066501 in nc_create_par_fortran (path=..., cmode=12288, comm=3, info=0 ... at /..trunc../libdispatch/dparallel.c:465
full jedi stack trace
Thread 1 "saber_quench_er" received signal SIGSEGV, Segmentation fault.
warning: 493 ../../../ompi/communicator/communicator.h: No such file or directory
0x000015554e4da4b2 in ompi_comm_invalid (comm=0x3) at ../../../ompi/communicator/communicator.h:493
bt
(gdb) #0 0x000015554e4da4b2 in ompi_comm_invalid (comm=0x3) at ../../../ompi/communicator/communicator.h:493
#1 0x000015554e4da618 in PMPI_Comm_dup (comm=0x3, newcomm=0x7ffffffcd260) at comm_dup.c:53
#2 0x000015554dc65ed0 in H5_mpi_comm_dup (comm=, comm_new=comm_new@entry=0x7ffffffcd280)
at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-hdf5-1.14.3-2r5r466idqikyjshsptrrwpqng6e4ky2/spack-src/src/H5mpi.c:110
#3 0x000015554df3ff49 in H5P__facc_mpi_comm_set (prop_id=, name=, size=, value=0x555555c7f0f0)
at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-hdf5-1.14.3-2r5r466idqikyjshsptrrwpqng6e4ky2/spack-src/src/H5Pfapl.c:5300
#4 0x000015554df5bcc3 in H5P__set_pclass_cb (plist=0x555555f8e260, name=0x15554e130685 "mpi_params_comm", prop=0x555555bf29f0, _udata=0x7ffffffcd310)
at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-hdf5-1.14.3-2r5r466idqikyjshsptrrwpqng6e4ky2/spack-src/src/H5Pint.c:3097
#5 0x000015554df5a64a in H5P__do_prop (plist=plist@entry=0x555555f8e260, name=name@entry=0x15554e130685 "mpi_params_comm",
plist_op=plist_op@entry=0x15554df59418 , pclass_op=pclass_op@entry=0x15554df5bbff , udata=udata@entry=0x7ffffffcd310)
at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-hdf5-1.14.3-2r5r466idqikyjshsptrrwpqng6e4ky2/spack-src/src/H5Pint.c:2796
#6 0x000015554df5d3ea in H5P_set (plist=plist@entry=0x555555f8e260, name=name@entry=0x15554e130685 "mpi_params_comm", value=value@entry=0x7ffffffcd338)
at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-hdf5-1.14.3-2r5r466idqikyjshsptrrwpqng6e4ky2/spack-src/src/H5Pint.c:3176
#7 0x000015554de02b29 in H5Pset_fapl_mpio (fapl_id=, comm=, info=)
at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-hdf5-1.14.3-2r5r466idqikyjshsptrrwpqng6e4ky2/spack-src/src/H5FDmpio.c:403
#8 0x00001555520bf1a0 in nc4_create_file (path=0x555555b63080 "testdata/error_covariance_training_bump_hdiag_1/1-1_sampling.nc", cmode=12288, initialsz=0,
parameters=0x7ffffffcd560, ncid=65536)
at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-netcdf-c-4.9.2-nh6sdcggap76245atvuf5pgx7jhg3gs5/spack-src/libhdf5/hdf5create.c:131
#9 0x00001555520bf756 in NC4_create (path=0x555555b63080 "testdata/error_covariance_training_bump_hdiag_1/1-1_sampling.nc", cmode=12288, initialsz=0, basepe=0, chunksizehintp=0x0,
parameters=0x7ffffffcd560, dispatch=0x1555521d50c0 , ncid=65536)
at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-netcdf-c-4.9.2-nh6sdcggap76245atvuf5pgx7jhg3gs5/spack-src/libhdf5/hdf5create.c:321
#10 0x000015555203d0bf in NC_create (path0=0x7ffffffcd5d0 "testdata/error_covariance_training_bump_hdiag_1/1-1_sampling.nc", cmode=12288, initialsz=0, basepe=0, chunksizehintp=0x0,
useparallel=1, parameters=0x7ffffffcd560, ncidp=0x7ffffffcda08)
at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-netcdf-c-4.9.2-nh6sdcggap76245atvuf5pgx7jhg3gs5/spack-src/libdispatch/dfile.c:1931
#11 0x0000155552066347 in nc_create_par (path=0x7ffffffcd5d0 "testdata/error_covariance_training_bump_hdiag_1/1-1_sampling.nc", cmode=12288, comm=0x3, info=0x0,
ncidp=0x7ffffffcda08) at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-netcdf-c-4.9.2-nh6sdcggap76245atvuf5pgx7jhg3gs5/spack-src/libdispatch/dparallel.c:131
#12 0x0000155552066501 in nc_create_par_fortran (path=0x7ffffffcd5d0 "testdata/error_covariance_training_bump_hdiag_1/1-1_sampling.nc", cmode=12288, comm=3, info=0,
ncidp=0x7ffffffcda08) at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-netcdf-c-4.9.2-nh6sdcggap76245atvuf5pgx7jhg3gs5/spack-src/libdispatch/dparallel.c:465
#13 0x0000155552b81ce8 in nf_create_par (path=..., cmode=12288, comm=3, info=0, ncid=-999, _path=_path@entry=1024) at nf_nc.F90:26
#14 0x0000155552bef629 in netcdf::nf90_create (path=..., cmode=12288, ncid=-999, initialsize=,
chunksize=, cache_size=,
cache_nelems=, cache_preemption=, comm=3, info=0,
_path=1024) at /tmp/spack-stack/cache/build_stage/ubuntu/spack-stage-netcdf-fortran-4.6.1-5fienqb2sciqsk2tnyfyrdcy7epbvx64/spack-src/fortran/netcdf4_file.F90:132
#15 0x0000155553cb0738 in tools_netcdf::netcdf_create_file (mpl=..., filename=..., iproc=,
io_override=, _filename=_filename@entry=1024)
at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/bump/tools_netcdf.fypp:400
#16 0x0000155553ec6b66 in type_samp::samp_write_global (samp=..., mpl=..., nam=..., geom=...) at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/bump/type_samp.fypp:1087
#17 0x0000155553ebd767 in type_samp::samp_setup_write (samp=..., mpl=..., nam=..., geom=...) at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/bump/type_samp.fypp:2013
#18 0x0000155553d1a9dd in type_bump::bump_run_drivers (bump=...) at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/bump/type_bump.fypp:630
#19 0x0000155553a86f40 in type_bump_interface::bump_run_drivers_c (key_bump=) at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/bump/type_bump_interface.F90:254
#20 0x00001555539f5619 in saber::bump::BUMP::runDrivers (this=0x555555be4570) at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/bump/BUMP.cc:581
#21 0x00001555539fe953 in saber::bump::NICAS::directCalibration (this=0x555555f96000, fsetEns=...) at /usr/include/c++/11/bits/unique_ptr.h:173
#22 0x00005555555d57ec in saber::SaberParametricBlockChain::SaberParametricBlockChain<:traits> (this=0x5555565d5b80, geom=..., dualResGeom=..., outerVars=..., fset4dXb=...,
fset4dFg=..., fsetEns=..., fsetDualResEns=..., covarConf=..., conf=...) at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/../saber/blocks/SaberParametricBlockChain.h:205
#23 0x00005555555d6a99 in std::make_unique<:saberparametricblockchain oops::geometry> const&, oops::Geometry<:traits> const&, oops::Variables const&, oops::FieldSet4D&, oops::FieldSet4D&, oops::FieldSets&, oops::FieldSets&, eckit::LocalConfiguration const&, eckit::Configuration const&> () at /usr/include/c++/11/bits/unique_ptr.h:962
#24 saber::SaberBlockChainMaker<:traits saber::saberparametricblockchain>::make (this=, geom=..., dualResGeom=..., outerVars=..., fset4dXb=..., fset4dFg=...,
fsetEns=..., fsetDualResEns=..., covarConf=..., conf=...) at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/../saber/blocks/SaberBlockChainBase.h:104
#25 0x00005555555a7a0e in saber::SaberBlockChainFactory<:traits>::create (name=..., geom=..., dualResGeom=..., outerVars=..., fset4dXb=..., fset4dFg=..., fsetEns=...,
fsetDualResEns=..., covarConf=..., conf=...) at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/../saber/blocks/SaberBlockChainBase.h:142
#26 0x00005555555d1559 in saber::ErrorCovariance<:traits>::ErrorCovariance (this=0x555555a4e0b0, geom=..., incVars=..., config=..., xb=..., fg=...)
at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/../saber/oops/ErrorCovariance.h:462
#27 0x00005555555d4665 in oops::CovarMaker<:traits saber::errorcovariance> >::make (this=, resol=..., vars=..., conf=..., xb=..., fg=...)
at /home/ubuntu/testbuild/jedi-bundle/oops/src/oops/base/ModelSpaceCovarianceBase.h:132
#28 0x00005555555a1ee0 in oops::CovarianceFactory<:traits>::create (resol=..., vars=..., conf=..., xb=..., fg=...)
at /home/ubuntu/testbuild/jedi-bundle/oops/src/oops/base/ModelSpaceCovarianceBase.h:176
#29 0x00005555555ec0b2 in saber::ErrorCovarianceToolbox<:traits>::execute (this=0x7ffffffd16c0, fullConfig=...)
at /home/ubuntu/testbuild/jedi-bundle/saber/src/saber/../saber/oops/ErrorCovarianceToolbox.h:249
Since nobody else using spack-stack has run into this issue so far, how can we reproduce this?
So far, this looks to me like an issue with the cmake build of netcdf-c. I am seeing the same on my laptop with gcc-13.
With the older spack v1 that uses the ./configure build for netcdf-c, I am not seeing this issue.
I think all that's needed is to add
if(MPI_mpi_LIBRARY)
SET(CMAKE_REQUIRED_LIBRARIES ${MPI_mpi_LIBRARY} ${CMAKE_REQUIRED_LIBRARIES})
endif()
right before
CHECK_FUNCTION_EXISTS(MPI_Comm_f2c HAVE_MPI_COMM_F2C)
CHECK_FUNCTION_EXISTS(MPI_Info_f2c HAVE_MPI_INFO_F2C)
In general, I wonder why this cmake build has to be so complicated. But that's for another day.
Until a fix is merged with netcdf-c (such as the excellent one suggested by @climbfuji ) another fix for builders who want a quick escape hatch is to build with the flag -DNC_EXTRA_DEPS="mpi"
I will see about getting this committed into the core netcdf-c library, when I have a moment and am back in front of a computer. Thanks!
Unfortunately, my suggested bug fix didn't work reliably for all compilers. I have a better, yet still hacky solution, in https://github.com/spack/spack-packages/pull/2384. See comment https://github.com/spack/spack-packages/pull/2384#issuecomment-3524918903.