hipfort Cuda Fortran kernels support

Will this tool only support hipify of host fortran calls or it will be able to compile Cuda Fortran kernels? For example:

  attributes(global) subroutine saxpy(x, y, a)
    implicit none
    real :: x(:), y(:)
    real, value :: a
    integer :: i, n
    n = size(x)
    i = blockDim%x * (blockIdx%x - 1) + threadIdx%x
    if (i <= n) y(i) = y(i) + a*x(i)
  end subroutine saxpy

Mar 08 '21 16:03 maiconfaria

Hey @maiconfaria , currently hipfort expects hip kernels to be written in C++ . As I'm aware, CUDA-Fortran syntax is restricted to only the PGI compilers.

Edit : Compiling cuda-fortran kernels is something that must be implemented by a compiler; hipfort relies on an underlying Fortran compiler and hipcc and/or nvcc to compile kernels in C++ syntax to be executable on the target hardware. Currently, global subroutines for GPU offloading are not part of a Fortran standard.

@gregrodgers , is this part of the roadmap for hipfort ?

Mar 08 '21 16:03 fluidnumerics-joe

@schoonovernumerics is correct, hipfort only targets the existing HIP API. As far as I am aware, it is not planned to support writing GPU kernels in Fortran in the ROCm ecosystem, they must be written in C++.

Mar 08 '21 16:03 pbauman

Thank you @schoonovernumerics and @pbauman. Write kernels in C++ for GPU ported Fortran code always seems to be a logical choice to me. Unfortunately, I don't know major Fortran software that follow this path when were ported to GPUs. Let's see if something arises to bring old-school scientific code to AMD's GPUs. OpenACC was also a great loss. Thank you guys again.

Mar 08 '21 17:03 maiconfaria

@maiconfaria

Let's see if something arises to bring old-school scientific code to AMD's GPUs

Let's see... in a month or so

Mar 08 '21 17:03 domcharrier

@maiconfaria Here's an example of a "non-major" Fortran code that uses hipfort currently : https://github.com/FluidNumerics/SELF I'm actively working on this application and would be happy to connect with you on getting ported over, if you're continuing to head that route. In fact, there are hackathons this year where we can work together to get you down this road : https://www.oshackathon.org/events/2021-amd-rocm-hackathons

Mar 08 '21 17:03 fluidnumerics-joe

Hey all, I wanted to give this thread a bump. I've been interacting with a few teams over the past month, and this continues to be a question.

Namely, to offload to GPUs in Fortran, can Fortran programmers ever anticipate writing in Fortran syntax, similar to what was done in CUDA Fortran ? Is this something the folks working on the amdflang/amdclang projects would need to undertake to expose this at the compiler level ?

@pbauman @domcharrier @gregrodgers , would you know who to tag at AMD to get something like this going ?

Dec 29 '21 02:12 fluidnumerics-joe

Namely, to offload to GPUs in Fortran, can Fortran programmers ever anticipate writing in Fortran syntax, similar to what was done in CUDA Fortran ?

AFAIK, this is not currently planned. @gregrodgers would know better who to poke. I'm not a compiler developer, but I believe it would be a pretty substantial effort to develop this capability, so it may be difficult to get traction to support native Fortran GPU kernels without some financial backing. Just my two cents (in a currency not worth very much).

Jan 04 '22 15:01 pbauman

Namely, to offload to GPUs in Fortran, can Fortran programmers ever anticipate writing in Fortran syntax, similar to what was done in CUDA Fortran ? Is this something the folks working on the amdflang/amdclang projects would need to undertake to expose this at the compiler level ?

FWIW Right now, you could try gpufort to extract a HIP kernel from your CUDA Fortran loopnest or kernel; see e.g.: https://github.com/ROCmSoftwarePlatform/gpufort/blob/main/examples/cudafortran/vector-add/vector-add.f90

Jan 14 '22 12:01 domcharrier

@domcharrier thanks for sharing that. I still hear from folks that keeping the CUDA-Fortran syntax is desirable for a number of reasons :

GPU kernels can be written in Fortran syntax, so long as the ATTRIBUTES(Global) prefix is applied to a subroutine definition. This removes the need to maintain two programming languages and additional boiler-plate code necessary to "glue" kernel launches into Fortran.
The DEVICE attribute for basic data types in fortran makes it simple to declare data on the GPU and allocate device memory with the ALLOCATE intrinsic
Memory copy between host and device is enabled by overloaded =. While this might not seem like much, but it is such a clean syntax to use for handling memcpy's.

While I still find hipfort quite useful, there are a number of groups that were initially doing this kind of ISO_C_BINDING stuff with CUDA before CUDA-Fortran. CUDA-Fortran provides a cleaner implementation with less code and a single language syntax, making research computing projects a lot more manageable.

Jan 20 '22 00:01 fluidnumerics-joe

CUDA-Fortran provides a cleaner implementation with less code and a single language syntax, making research computing projects a lot more manageable

I would also love it if hip went in this direction of supporting Fortran on GPUs in the same style as CUDA Fortran this way. I started out with CUDA C/C++, but went to CUDA Fortran after a few projects because of its simpler syntax and style [plus first-class array treatment and ability to avoid explicit pointer management].

Feb 26 '22 22:02 adenchfi

Hello everyone, @domcharrier could you please give an update on the gpufort project ? is it still being developed or discontinued by AMD? It appears that main branch hasn't moved since long time. do you recommend to try some other branch (even with less features supported) ?

Jun 09 '23 09:06 umeshpp

Hi @umeshpp ,

It's currently on hold. Part of the reason is that we do not have good data on who is using it. The other part is that third-party compilers (GCC, HPE) made some progress wrt OpenMP and OpenACC.

The develop-acc-no-cptrs branch is essentially the new main branch when you are porting OpenACC applications. Most recent but very unstable work is on feature/parallelism-levels, where I started to bring semantic analysis and translation-time parameter evaluations to the code generator. It would probably take ~3 months to get this stable.

Jun 19 '23 14:06 domcharrier

Hi Dominic @domcharrier Thanks a lot for this update. it is really helpful. could you please give a status update on the part which ports 'cuda fortran' kernels to 'hip c++' kernels ? and if any, which branch would you suggest to use for this type of porting ? I am porting a scientific application which has a lot of cuda fortran kernels and after some tests with the main branch of 'gpufort' we have decided to port the cuda fortran kernels manually to hip c++ kernels. Would you say this is the way to go for the moment ?

Jun 20 '23 16:06 umeshpp

Hi @umeshpp ,

I am porting a scientific application which has a lot of cuda fortran kernels and after some tests with the main branch of 'gpufort' we have decided to port the cuda fortran kernels manually to hip c++ kernels. Would you say this is the way to go for the moment ?

Yes, that's probably the way to go, right now.

In case you want to experiment further:

I introduced some interoperable array types up to dimension 7 on GPUFORT branch develop-acc-no-cptr , which might help you: https://github.com/ROCmSoftwarePlatform/gpufort/tree/develop-acc-no-cptrs/examples/gpufort_array They are the default data types used for arrays in the generated C++ kernels on that branch.

This is a good example: https://github.com/ROCmSoftwarePlatform/gpufort/tree/develop-acc-no-cptrs/examples/gpufort_array/vector-add-hipmalloc (Note that the kernel launch routines signature will likely only work fine with GCC on a linux OS).

The arrays have the following properties:

You can wrap them around Fortran (device) pointers (e.g. created with hipMalloc) on the Fortran side.
You can type them (float, double, ...) on the C++ side.
You can use the Fortran index operator () on the C++ side, which makes kernels significantly more readable.

Jun 20 '23 16:06 domcharrier

Thanks a lot @domcharrier for the details. I will look into them.

Jun 21 '23 16:06 umeshpp

hipfort hipfort copied to clipboard

Cuda Fortran kernels support

hipfort
hipfort copied to clipboard