Fortran ports
This is a new implementation of BabelStream using Fortran.
The code uses a Fortran driver that is largely equivalent to the C++ one, with a few exceptions. First, it does not use a C++ class for the stream object, since that doesn't seem like a useful way to do things in Fortran. Instead, I use a module that contains the same methods, and which has alloc and dealloc that act like CTOR and DTOR.
The current implementations are:
DO CONCURRENT- Fortran array notation
- Sequential
DOloops - OpenACC
parallel loop - OpenACC
kernelson Fortran array notation - OpenMP
parallel do - OpenMP
taskloop - OpenMP
target teams distribute parallel do simd - OpenMP
target teams loop - CUDA Fortran (handwritten CUDA Fortran kernels, except DOT)
- CUDA Fortran kernels (
!$cuf kernel do <<<*,*>>>)
I have tested with GCC, Intel (ifort and ifx), and NVHPC compilers on AArch64, x86_64 and NVIDIA GPU targets, although not exhaustively.
The current build system is GNU Make, and requires the user to manually specify the compiler and implementation. I have not, and will not, do anything related to CMake.
~The only thing missing now is CSV output.~ CSV printing is supported.
Comment to the reviewers: For the array version, would you consider adding !$acc kernels around the sections of array syntax? In theory, it would also be valid to put an !$omp workshare around them for OpenMP compilers. I'm not sure whether any compilers will auto-parallelize, much less auto-offload, the array syntax, but a variety of compilers will happily support one or both of those parallelization hints.
@jefflarkin - definitely worth exploring what happens in both of those cases for sure!
@jefflarkin See OpenACCArrayStream.F90
OpenMPWorkshareStream.F90 is there too now, but it should not offload, because OpenMP does not have omp target workshare yet, as far as I know. It should, but someone has to argue with the workshare haters on the committee.
I removed the IEEE NaN check from this branch, because I did it wrong. It is now fixed, and there are other improvements in other branches, which I'll merge after the paper is accepted. I don't want to break the branch linked in the paper for obvious reasons.
I've now implemented CSV printing, while REAL32 and INT32 options both seem to be working now too.
This is ready for review. I merged in all the good stuff done post-submission.