BabelStream icon indicating copy to clipboard operation
BabelStream copied to clipboard

Fortran ports

Open jeffhammond opened this issue 3 years ago • 5 comments

This is a new implementation of BabelStream using Fortran.

The code uses a Fortran driver that is largely equivalent to the C++ one, with a few exceptions. First, it does not use a C++ class for the stream object, since that doesn't seem like a useful way to do things in Fortran. Instead, I use a module that contains the same methods, and which has alloc and dealloc that act like CTOR and DTOR.

The current implementations are:

  • DO CONCURRENT
  • Fortran array notation
  • Sequential DO loops
  • OpenACC parallel loop
  • OpenACC kernels on Fortran array notation
  • OpenMP parallel do
  • OpenMP taskloop
  • OpenMP target teams distribute parallel do simd
  • OpenMP target teams loop
  • CUDA Fortran (handwritten CUDA Fortran kernels, except DOT)
  • CUDA Fortran kernels (!$cuf kernel do <<<*,*>>>)

I have tested with GCC, Intel (ifort and ifx), and NVHPC compilers on AArch64, x86_64 and NVIDIA GPU targets, although not exhaustively.

The current build system is GNU Make, and requires the user to manually specify the compiler and implementation. I have not, and will not, do anything related to CMake.

~The only thing missing now is CSV output.~ CSV printing is supported.

jeffhammond avatar Aug 03 '22 14:08 jeffhammond

Comment to the reviewers: For the array version, would you consider adding !$acc kernels around the sections of array syntax? In theory, it would also be valid to put an !$omp workshare around them for OpenMP compilers. I'm not sure whether any compilers will auto-parallelize, much less auto-offload, the array syntax, but a variety of compilers will happily support one or both of those parallelization hints.

jefflarkin avatar Aug 04 '22 19:08 jefflarkin

@jefflarkin - definitely worth exploring what happens in both of those cases for sure!

tomdeakin avatar Aug 04 '22 19:08 tomdeakin

@jefflarkin See OpenACCArrayStream.F90

jeffhammond avatar Aug 05 '22 04:08 jeffhammond

OpenMPWorkshareStream.F90 is there too now, but it should not offload, because OpenMP does not have omp target workshare yet, as far as I know. It should, but someone has to argue with the workshare haters on the committee.

jeffhammond avatar Aug 05 '22 04:08 jeffhammond

I removed the IEEE NaN check from this branch, because I did it wrong. It is now fixed, and there are other improvements in other branches, which I'll merge after the paper is accepted. I don't want to break the branch linked in the paper for obvious reasons.

I've now implemented CSV printing, while REAL32 and INT32 options both seem to be working now too.

jeffhammond avatar Sep 04 '22 11:09 jeffhammond

This is ready for review. I merged in all the good stuff done post-submission.

jeffhammond avatar Nov 09 '22 13:11 jeffhammond