netcdf-fortran
netcdf-fortran copied to clipboard
Test failure with Intel 2021.1 Beta in ftst_vars.F
-
the version of the software with which you are encountering an issue I'm trying various build configurations and while netcdf-fortran 4.5.2 works with the same combination of compiler, MPI and netcdf-c, 4.5.3 fails in nf_test4: for ftst_vars and f03tst_vars.
-
environmental information (i.e. Operating System, compiler info, java version, python version, etc.) I'm using CentOS 8.3.2011 on an x86_64 server without java, python 3.8 installed. Intel compiler is the 2021.1 OneAPI package with 2021.1.1 Intel MPI.
-
a description of the issue with the steps needed to reproduce it I built the packages for netcdf c 4.7.4 on top of pnetcdf-1.12.1, hdf5 1.12.0 and libaec 1.0.4 (all respective make check's completed successfully) and then built netcdf-fortran on top of that. 4.5.2 succeeds all tests, 4.5.3 fails the two above.
A stack backtrace from ftst_vars shows the program failing in line 175 of ftst_vars.F with
cache_preemption_in = 75
cache_nelems_in = 1009
cache_size_in = 4194304
not matching the expected values. I'm not sure how to provide more reproducer info in a reasonably concise way. Do you want all the config.log files?
Can you use a print statement to find out what values for those three variables the tests is actually getting?
Sure, those give the same values the debugger revealed as per my initial issue report, adding
write (0, *) cache_size_in, default_cache_size,
& cache_nelems_in, default_cache_nelems,
& cache_preemption_in, default_cache_preemption
right before the conditional stop 4 gives an extra output of
4194304 16777216 1009 4133 75 75
Also, looking a bit further at the test code: is there a need to invoke undefined behaviour right at the start of executable lines of the test at line 54, reading
data_out(y, x) = 2147483646 + x * y
which clearly causes signed integer overflow on any platform where huge(1) == 2147483647, i.e. something not allowed in a valid Fortran program?
After inspecting this further I have determined the reason: the recipe I used for netcdf-c contains the following:
--with-chunk-cache-size=4194304 \
--with-chunk-cache-nelems=1009 \
which a colleague determined should be used for better performance on the storage system in question. So this casts further doubt on the validity of this particular test: does it make sense for netcdf-fortran to insist on a netcdf-c that uses default chunking parameters? Since you wrote the test @edwardhartnett, please advise.
It sounds like you are correct about the test being unreasonable.
Can you submit a PR with a corrected test?