FMS
FMS copied to clipboard
Floating point divide-by-zero error in test_mpp_io
Describe the bug test_mpp_io crashes with a floating point divide-by-zero error when a time axis is registered to a netcdf file via mpp_write_meta, which calls nf_def_var. To Reproduce configure the environment on the Skylake or AMD box to run with intel19/20, run make distcheck, be c
Expected behavior The time axis is registered to the file and the test runs successfully System Environment Describe the system environment, include:
- OS: CENTOS8 (AMD), CENTOS7(Skylake)
- Compiler(s): intel 19/20
- MPI type, and version: impi2020_up2 (AMD), impi2020_up5 (Skylake)
- netCDF Version: netcdf 4.6.1
- Configure options: FCFLAGS O0 -g -traceback -check all -check noarg_temp_created -check nopointer -nowarn -ftz -auto -safe-cray-ptr -ftrapuv -I/opt/netcdf/4.6.1/INTEL/include/ CFLAGS: -O0 -g -traceback -ftrapuv -I/opt/netcdf/4.6.1/INTEL/include/
Additional context stack trace
Using NEW domaintypes and calls...
netCDF single thread write
forrtl: error (73): floating divide by zero
Image PC Routine Line Source
test_mpp_io 000000000044826B Unknown Unknown Unknown
libpthread-2.28.s 00007F7686ED5DD0 Unknown Unknown Unknown
libnetcdf.so.13.1 00007F7689EB0842 Unknown Unknown Unknown
libnetcdf.so.13.1 00007F7689EAE5F4 NC4_def_var Unknown Unknown
libnetcdf.so.13.1 00007F7689E32B7B nc_def_var Unknown Unknown
libnetcdff.so.6.1 00007F768A14E47F nf_def_var_ Unknown Unknown
libFMS.so.4.0.0 00007F768B70EEB9 mpp_io_mod_mp_mpp 459 mpp_io_write.inc
test_mpp_io 000000000041E17E test_IP_test_netc 394 test_mpp_io.F90
test_mpp_io 000000000040EAF7 MAIN__ 123 test_mpp_io.F90
test_mpp_io 000000000040DD62 Unknown Unknown Unknown
libc-2.28.so 00007F768671E6A3 __libc_start_main Unknown Unknown
test_mpp_io 000000000040DC6E Unknown Unknown Unknown
@rem1776 can you look into this?
@uramirez8707 This looks like a version issue with netcdf, nf_def_var only throws this error with netcdf/4.6.1, the same call and arguments with 4.7.4 returns successfully. Also tried a different test for mpp_io and it failed as well when the time axis was written.
@rem1776 have you looked at the values that it's writing?
@thomas-robinson From here, it looks like its writing a double. The call to mpp_write_meta is with t which is axistype and I don't think t gets modified before the test.
The netcdf file itself also has a hdf error when I tried to use ncdump, but I'm not sure how relevant that is.
No i mean have you looked at the actual data. Is it all numbers? Is something off about it?
No, the axis data looks to be the same in both cases.
FWIW, I've been able to isolate this to the -ftrapuv flag, which "Initializes stack local variables to an unusual value to aid error detection". If I take the flag out, it compiles without exception.
According to this Intel article maybe we should replace -ftrapuv
with -check uninit
? The test passes when I do this replacement.
Update: the same error occurs in test_diag_manager when writing a time axis metadata to a netcdf file.
OS/ENV: Skylake, intel18_up4, run make distcheck
test1.1 successful: module/output_field=test_diag_manager_mod/dat1 Bounds of buffer exceeded. Buffer bounds= 1: 10, 1: 10, 1: 10 Actual bounds= 1: 20, 1: 20, 1: 10
test1.2 successful
NOTE: Potential error in diag_manager_end: dat1 NOT available, check if output interval > runlength. Netcdf fill_values are written
forrtl: error (73): floating divide by zero
Image PC Routine Line Source
test_diag_manager 000000000045460E Unknown Unknown Unknown
libpthread-2.17.s 00007FB8D6AB0630 Unknown Unknown Unknown
libnetcdf.so.13.1 00007FB8D918FDE6 Unknown Unknown Unknown
libnetcdf.so.13.1 00007FB8D918D9E2 NC4_def_var Unknown Unknown
libnetcdf.so.13.1 00007FB8D9102FAB nc_def_var Unknown Unknown
libnetcdff.so.6.1 00007FB8D943FDC8 nf_def_var_ Unknown Unknown
libFMS.so.5.0.0 00007FB8DA9A31E8 mpp_io_mod_mp_mpp 459 mpp_io_write.inc
libFMS.so.5.0.0 00007FB8DB2CA3A8 diag_output_mod_m 1571 diag_output.F90
libFMS.so.5.0.0 00007FB8DB33824A diag_util_mod_mp_ 2060 diag_util.F90
libFMS.so.5.0.0 00007FB8DB3631A1 diag_util_mod_mp_ 2770 diag_util.F90
libFMS.so.5.0.0 00007FB8DB3563A4 diag_util_mod_mp_ 2623 diag_util.F90
libFMS.so.5.0.0 00007FB8DB508365 diag_manager_mod_ 3799 diag_manager.F90
libFMS.so.5.0.0 00007FB8DB5001F9 diag_manager_mod_ 3705 diag_manager.F90
test_diag_manager 000000000042664A MAIN__ 995 test_diag_manager.F90