benchmarks
benchmarks copied to clipboard
Benchmark criteria
We should decide on criteria for what makes a good/suitable benchmark problem and how it can be implemented.
I proposed criteria in https://github.com/fortran-lang/webpage/issues/61. At the high level:
Our benchmarks should show, that Fortran is an excellent choice for this mission:
- enable scientists, engineers, and other domain experts to write programs that naturally express the mathematics and algorithms employed, are portable across HPC systems, remain viable over decades of use, and extract a high percentage of performance from the underlying hardware.
At a lower level, we might want to consider to have several sections of benchmarks:
- Fortran only (Fortran compilers comparison)
- Languages comparison (the best is probably to show various Fortran and C++ and other languages' compilers); inline assembly and intrinsics are not allowed
- Assembly section (where any code in any form is allowed)
The "Languages comparison" section can also have subsections, like
- all optimizations on (the default section)
-ffast-mathnot allowed (that's essentially the Julia benchmarks page), which has its merits, but I would argue most Fortran users care more about the "all optimizations on" section
In general, I would say that instead of rejecting some benchmarks (for one reason or another), let's rather have different sections and explain why a given benchmark (say with -ffast-math enabled) is not a good candidate for a section where -ffast-math is not allowed, but it's a great candidate for a section with "all optimizations on". That way different people and communities might be interested in different sections (for example I expect the Julia community might be particularly interested in the -ffast-math is not allowed section, and most Fortran users I expect might be interested in the "all optimizations on" section).
I agree, this is a good structure.
For the Fortran only comparison, it would be good to have several levels of optimisation to be able to see what is gained at each level:
- O0
- O1
- O2
- O3
- O3+ffast-math
Remember that different Fortran compilers have different optimization options. For GFortran, I use the following options to get maximum performance: -O3 -march=native -ffast-math -funroll-loops. More modern GFortran also has -Ofast. Furthermore, I should enable things like allocating arrays on stack.
So we can have various sections that you proposed, but it might be hard to do that across compilers. But at the very least, we need to have a section of "the best options for the given compiler and benchmark (?)". I put a question mark next to benchmark, as I don't know if we should allow specific options that are benchmark dependent, or just have options for the whole section.
Update: thinking about this, I think I would like a section where we figure out the best optimizations options that we recommend for each Fortran compiler, and use those options for all benchmarks in the section. The idea is that users would then simply use the same options for their code and could expect the performance from the benchmarks.
Yes, the list of optimisation levels will be compiler-dependent - hence why we need a flexible drive framework. By running many optimisation levels we can identify the best set for each benchmark - this is what happens in the Julia repo. This can also identify any cases of over-aggressive optimisation
we figure out the best optimizations options that we recommend for each Fortran compiler, and use those options for all benchmarks in the section.
The problem I see with this is that the 'best optimisation options' will be code-dependent. If we run a selection of different options for each compiler, we can automatically determine the best set for each benchmark case - this informs the user which options are suitable for which codes.
@LKedward good point, I agree. I think we should simply collect as much data as we can. How we then visualize and present the data and organize it into sections is a tough problem and we should be flexible and I expect we will iterate on things as we gain experience.
I do think we should have some recommendations what users should use for their codes, and recommend that these are good defaults, but if they care more about performance, to test different options and see for their particular code.
I agree. Run benchmarks for all compilers and all optimization levels, while keeping note that inter-compiler comparison may be meaningful only for no-optimization case and maybe the maximum optimization, but I'm not sure about the latter.
I do think we should have some recommendations what users should use for their codes, and recommend that these are good defaults, but if they care more about performance, to test different options and see for their particular code.
I agree, this is important.
Run benchmarks for all compilers and all optimization levels, while keeping note that inter-compiler comparison may be meaningful only for no-optimization case and maybe the maximum optimization, but I'm not sure about the latter.
For inter-compiler comparison it may be best to simply compare the best result from all levels.