aronnax icon indicating copy to clipboard operation
aronnax copied to clipboard

Automatically measure execution and result consistency across platforms

Open axch opened this issue 7 years ago • 3 comments

The relevant dimensions of variability in computing environment are

  • Operating system
  • Operating system version
  • Fortran compiler
  • Fortran compiler version
  • Fortran compiler optimization flag setting
  • Processor architecture
  • Scipy version, while the Python depends on that
  • NetCDF library version, if the Fortran begins to depend on that
  • (Also Python version, but I think it's safe to assume it is determined by the operating system version, rather than independently variable)

Ideally, there would be an automatic build that, jointly across those dimensions:

  • Confirms the model builds and runs,
  • Measures and reports numerical discrepancies in the answers, and
  • Measures and reports runtime and memory use variation
  • On low-resolution examples with complete code coverage.

Even more ideally, would characterize how variations scale with resolution and run length.

Note 1: A complete matrix test would be overkill; the best that can be hoped for would be random sampling of configurations within the space, possibly with extra attention to extremes like the lastest and earliest supported versions of everything.

Note 2: Getting all the dimensions at once would be overkill too; incremental progress consists of adding one dimension at a time, starting with those most likely to cause trouble due to varying across installations.

This subsumes what's left of Issue #29.

axch avatar Mar 08 '17 12:03 axch

Of these, the optimisation flag should be the easiest to implement - we can simply loop through a number of them in the python tests. We might also expect this to have one of the largest impacts on the output. Seems like a good place to start to me.

edoddridge avatar Mar 08 '17 21:03 edoddridge

A sub-choice of this is to choose what Fortran standard to aim for. Assuming Fortran is source-level backward compatible, the polite thing is to use the oldest standard that has the features necessary for the program to work, so that it will compile correctly under the greatest variety of tools.

Considerations:

  • The file extension .f90 suggests, to a naive reader, that it's meant to be Fortran 90.
  • Gfortran has switches that appear to be for checking compliance to the Fortran 95, Fortran 2003, and Fortran 2008 standards, but, as far as I am aware, not Fortran 90. Do we have access to a Fortran 90 compiler or conformance checker to test with? Will anyone else want to a compiler that only understands Fortran 90 to run MIM?

axch avatar Mar 10 '17 05:03 axch

The code is currently aimed at the Fortran 90 format, but the changes between F90 and F95 are relatively minor. Fortran 90 was chosen because it is the oldest Fortran standard that allows free form programming.

I suspect gfortran is likely to be the most common compiler. Given that it has a check for F95 compliance, and that the F95 standard does deal somewhat with allocatable arrays I think it makes sense to switch to F95.

edoddridge avatar Mar 10 '17 14:03 edoddridge