benchmarks icon indicating copy to clipboard operation
benchmarks copied to clipboard

Add modern poisson2d solver & rename optimized.f90

Open rouson opened this issue 4 years ago • 17 comments

I hope the benefit of this pull request extends beyond this specific refactoring and serves as a guide for other codes contributed to this project. The aim is to demonstrate what modern Fortran can do to improve the expressiveness of code while maintaining and potentially improving performance.

In my local tests compiling with gfortran -Ofast, what was previously named optimized.f90 only performs about 5% faster than modern.f90 on average. In 2021, that hardly justifies restricting oneself to a small subset of Fortran 95 that is very nearly Fortran 77 + free format even in the name of performance. The modern version is more compact in some places, adds safety in some places (e.g., tighter variable scoping means fewer opportunities for mistakenly changing variables), and adds additional optimization opportunities now that compilers such as NVIDIA's compiler can offload do concurrent to a GPU. This means there's even more that can be explored with this new code than I've been able to investigate so far.

Refactorings

  1. Use block for tighter variable scoping (reduces some clutter).
  2. Use array statements with vector subscripts to write compact initializations.
  3. Use do concurrent to express fine-grained parallelism and expose opportunities for optimizations such as vectorization, multithreading, or GPU offloading.
  4. Replace some [magic numbers], e.g., initial delta.
  5. Eliminate some unnecessary computation by initializing and updating only values that actually need initializing and updating.
  6. Use more compact conditional logic.
  7. Use a real kind specification based on precision requirements rather than the ambiguously defined double-precision representation of any one given compiler on any one given platform.
  8. Eliminate unnecessary precision specifiers (_dp) when an expression is simply a constant that is exactly representable in any reasonable floating-point representation.
  9. Separate the procedure interface into a module and the procedure definition into a submodule.**
  10. Use a more descriptive variable names in a few places.
  11. Use associate to ensure immutable state for what is essentially a runtime constant.
  12. Add comments! (including some in FORD style)

** In larger projects, item 9 makes code much more user-friendly by isolation the application programmer interface (API) and reduces compile times by eliminating unnecessary compilation cascades. [magic numbers]: https://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants

rouson avatar Jun 29 '21 03:06 rouson

Thank you for this! Looks nice.

Eliminate unnecessary precision specifiers (_dp) when an expression is simply a constant that is exactly representable in any reasonable floating-point representation.

I see --- I think you are taking advantage of this here, so you wouldn't need my suggested changes above? But is that really true? I know that 1.0 is exactly representable, but I thought the compilers will still only treat it as single precision when assigning and not set the higher bits. But maybe not. But 0.01 I think is not exactly representable.

certik avatar Jun 29 '21 04:06 certik

I see. I am a minimalist too, that's why I just use rho = 1.

certik avatar Jun 29 '21 04:06 certik

@certik good points. I think I just accepted your changes, but I'm not sure. If so, feel free to merge this or let me know if you want me to merge it. If I didn't accept your changes correctly, let me know and I'll see if I can fix it.

rouson avatar Jun 29 '21 04:06 rouson

I don't know how to suggest changes to files that weren't changed in the pull request, so I'll just write here. Since you renamed one of the Fortran files, can you also rename the corresponding row in the table in README.md?

Also, you could add a row to the table, with the following numbers:

M=100: 0.43 M=200: 8.24 M=300: 38.39

which are the numbers I got on my machine with the same settings as the other files (-Ofast, WSL2, same CPU, etc.)

arunningcroc avatar Jun 29 '21 13:06 arunningcroc

Ah, I didn't realize the file got renamed from optimized to archaic. @rouson, do you have a better name? The term archaic suggests (to me) that that style should not be used anymore, but I actually like that style, that's how I write most of my codes. I don't know if you like any of the terms such as simple, traditional, plain, direct, ... I don't want to pass judgement yet, I am hoping to first get all the examples up with lots of versions and later I think it will be easier to discuss the different styles and to agree (or agree to disagree) on the recommended "idiomatic style(s)". We can also just number the different Fortran versions/styles.

(To rephrase what I am trying to achieve: I absolutely want to have such a discussion about how modern Fortran codes should be written and arrive at more opinionated community agreed upon style(s). But in order to have a good discussion about this, I am hoping to have a few dozen examples, each with different styles. And only then have that discussion. I am afraid the discussion would not be productive if we do it now.)

certik avatar Jun 29 '21 13:06 certik

@certik I like your suggestion to rename archaic.f90 to traditional.f90. I'll push a new commit that makes that change. On a related note, you can think of a better name for modern.f90, please let me know. There can of course be many different modern approaches. I chose modern.f90 for two reasons: (1) it uses features from more recent standards and (2) I'm hoping more Fortran programming moves in this direction for several reasons.

rouson avatar Jun 29 '21 15:06 rouson

If you like the whole-array form over the do concurrent one, you could rename the sources as "archaic" -> "imperative" and "modern" -> "array-oriented". I think the style-based naming communicates more than the era-based naming. Just an idea.

milancurcic avatar Jun 29 '21 16:06 milancurcic

I don't know why yet, but modern.f90 in this PR, using gfortran-9.3.0 converges only for -O[01], but does not converge for -O{2,3,fast}. I think we need to fix it before merging.

milancurcic avatar Jun 29 '21 16:06 milancurcic

I don't know why yet, but modern.f90 in this PR, using gfortran-9.3.0 converges only for -O[01], but does not converge for -O{2,3,fast}. I think we need to fix it before merging.

I seem to be able to converge it for M=100, 200, 300 with -Ofast, or at least it outputs a number of iterations in line with the C code (and Cython/python where available) using gfortran-9.3.0

arunningcroc avatar Jun 29 '21 16:06 arunningcroc

I apologize, I gave incomplete information. I don't converge for M=50 (I made it small for shorter tests). Do the method and the problem require a minimum value of M for the method to converge?

milancurcic avatar Jun 29 '21 17:06 milancurcic

I apologize, I gave incomplete information. I don't converge for M=50 (I made it small for shorter tests). Do the method and the problem require a minimum value of M for the method to converge?

I get the same problem for M=50. I'm not sure if there's any general proof for what kind of grid is too small, but the other Fortran codes seem to work. So clearly the modern code is somehow generating a different code. The basic logic of it seems to be the same, though, and works for bigger M, so I wonder if the compiler isn't doing some wonky optimization because of the modern features that is simply buggy?

arunningcroc avatar Jun 29 '21 17:06 arunningcroc

When compiled with gfortran (installed via Homebrew on macOS Big Sur) 11.1.0 with no optimization , the code converges for me for all the M values that make sense: M>2. When compiled with gfortran -Ofast, the code converges for M>90.

I also tried the NAG compiler version 7.0, Build 7044, and found a compiler bug that I just reported to NAG. I'm pasting the bug report below.

I'm not sure what to do from here. Before refactoring code, it's always safest to have autoamated tests in place. I hope that any code contributed to this repository will include tests, which will probably need to be internal to the code, given the difficulty of setting up a multi-language test harness that covers so many languages.

Bug Report

% cat interface-not-found.f90 

module f_interface
contains
  pure real function f()
    f = 1.
  end function
end module 

  use f_interface
  implicit none

  integer i
  real f_sampled(1)
  
  associate(dy => 0.) 
    do concurrent(i=1:1)
      f_sampled(i) = f()
    end do
  end associate
end

% nagfor interface-not-found.f90 
NAG Fortran Compiler Release 7.0(Yurakucho) Build 7044
Error: interface-not-found.f90, line 19: Implicit type for F
Questionable: interface-not-found.f90, line 19: Variable F_SAMPLED set but never referenced
[NAG Fortran Compiler pass 1 error termination, 1 error, 1 warning]

rouson avatar Jun 29 '21 18:06 rouson

I've managed to isolate the problem. In a puzzling turn of events, it appears gfortran doesn't like the rho function and the submodule for some reason. Removing both the rewritten if-else as well as the submodule seems to fix this problem (but neither one on their own fixes it). That is, just use the old rho function in a module instead of the submodule, and for some reason the code works again.

Edit: but also, simply putting rho in a submodule and then calling it from a test program still doesn't cause any issues, and the code executes correctly. The loop seems to be required for the convergence issue to arise.

arunningcroc avatar Jun 29 '21 19:06 arunningcroc

@arunningcroc could you either insert code edits on GitHub or submit a new pull request with working code? Most importantly, if you were the original author (or simply if you have thoughts on how to do so), could you include some sort of results-verification code? Possibly the best thing might be for each of the Fortran versions to be refactored into being functions instead of main programs. Assuming someone has verified that some version produces the desire solution, that version could be the reference version of the code. Then there can be just one Fortran main program that calls each version with cpu_time calls in-between for timings. The result produced by each refactored version can then be checked against the result of the reference version.

@certik if you don't mind introducing object-oriented programming, @everythingfunctional and I have developed a design pattern that captures everything I just wrote and specifies requirements in the form of deferred bindings on an abstract type we call the Oracle pattern, inspired by the common term "test oracle." It's actually a special case of a Template Method pattern. In addition to saving time explaining the same concepts in English, requiring contributors extend an oracle abstract derived type enforces the project requirements formally and operational and ensures uniformity across the project.

rouson avatar Jun 29 '21 21:06 rouson

@certik if you don't mind introducing object-oriented programming,

Absolutely. Please submit another PR with the OO code!

certik avatar Jun 29 '21 22:06 certik

@rouson @arunningcroc could you either insert code edits on GitHub or submit a new pull request with working code? Most importantly, if you were the original author (or simply if you have thoughts on how to do so), could you include some sort of results-verification code? Possibly the best thing might be for each of the Fortran versions to be refactored into being functions instead of main programs. Assuming someone has verified that some version produces the desire solution, that version could be the reference version of the code. Then there can be just one Fortran main program that calls each version with cpu_time calls in-between for timings. The result produced by each refactored version can then be checked against the result of the reference version.

Yes, I'll try to get this done this afternoon. Also, I think I should probably write some sort of problem description pdf, explaining the theory and the numerical method in detail, so that anyone may easily implement it in their language of choice.

For the test cases, I know that the Python code should produce correct results, seeing as it is lifted almost wholesale from a computational physics textbook. However, there are also known analytical solutions to 2D Poisson and Laplace equations, and these might be valuable for testing as well. Presumably finding the correct solution to a few such cases will be a good guarantee that the code works as expected.

If I separate the Fortran code in to modules/functions and call them all from a single main program, should I do the same for the Python, C and Cython codes as well, so that each language is called from some file like "main.f90", "main.py", etc? Then anyone who wants to see all the timings of a particular language could always just execute the main file for that language, and one could imagine adding some bash/powershell script to run all of them consequentially.

arunningcroc avatar Jun 30 '21 05:06 arunningcroc

@arunningcroc that all sounds great. The C code could be called by the Fortran main program via C bindings. I've heard that other languages also support C bindings so it might also be possible to call other languages from one Fortran main program.

rouson avatar Jun 30 '21 17:06 rouson