netcdf-fortran icon indicating copy to clipboard operation
netcdf-fortran copied to clipboard

Test failure with Intel 2021.1 Beta in ftst_vars.F

Open tjahns opened this issue 4 years ago • 5 comments

  • the version of the software with which you are encountering an issue I'm trying various build configurations and while netcdf-fortran 4.5.2 works with the same combination of compiler, MPI and netcdf-c, 4.5.3 fails in nf_test4: for ftst_vars and f03tst_vars.

  • environmental information (i.e. Operating System, compiler info, java version, python version, etc.) I'm using CentOS 8.3.2011 on an x86_64 server without java, python 3.8 installed. Intel compiler is the 2021.1 OneAPI package with 2021.1.1 Intel MPI.

  • a description of the issue with the steps needed to reproduce it I built the packages for netcdf c 4.7.4 on top of pnetcdf-1.12.1, hdf5 1.12.0 and libaec 1.0.4 (all respective make check's completed successfully) and then built netcdf-fortran on top of that. 4.5.2 succeeds all tests, 4.5.3 fails the two above.

A stack backtrace from ftst_vars shows the program failing in line 175 of ftst_vars.F with

cache_preemption_in = 75
cache_nelems_in = 1009
cache_size_in = 4194304

not matching the expected values. I'm not sure how to provide more reproducer info in a reasonably concise way. Do you want all the config.log files?

tjahns avatar Feb 26 '21 20:02 tjahns

Can you use a print statement to find out what values for those three variables the tests is actually getting?

edwardhartnett avatar Feb 26 '21 20:02 edwardhartnett

Sure, those give the same values the debugger revealed as per my initial issue report, adding

      write (0, *) cache_size_in, default_cache_size,
     &     cache_nelems_in, default_cache_nelems,
     &     cache_preemption_in, default_cache_preemption

right before the conditional stop 4 gives an extra output of

     4194304    16777216        1009        4133          75          75

tjahns avatar Feb 26 '21 22:02 tjahns

Also, looking a bit further at the test code: is there a need to invoke undefined behaviour right at the start of executable lines of the test at line 54, reading

          data_out(y, x) = 2147483646 + x * y

which clearly causes signed integer overflow on any platform where huge(1) == 2147483647, i.e. something not allowed in a valid Fortran program?

tjahns avatar Feb 26 '21 23:02 tjahns

After inspecting this further I have determined the reason: the recipe I used for netcdf-c contains the following:

    --with-chunk-cache-size=4194304 \
    --with-chunk-cache-nelems=1009 \

which a colleague determined should be used for better performance on the storage system in question. So this casts further doubt on the validity of this particular test: does it make sense for netcdf-fortran to insist on a netcdf-c that uses default chunking parameters? Since you wrote the test @edwardhartnett, please advise.

tjahns avatar Mar 10 '21 16:03 tjahns

It sounds like you are correct about the test being unreasonable.

Can you submit a PR with a corrected test?

edwardhartnett avatar Mar 10 '21 16:03 edwardhartnett