hipfort
hipfort copied to clipboard
Cuda Fortran kernels support
Will this tool only support hipify of host fortran calls or it will be able to compile Cuda Fortran kernels? For example:
attributes(global) subroutine saxpy(x, y, a)
implicit none
real :: x(:), y(:)
real, value :: a
integer :: i, n
n = size(x)
i = blockDim%x * (blockIdx%x - 1) + threadIdx%x
if (i <= n) y(i) = y(i) + a*x(i)
end subroutine saxpy
Hey @maiconfaria , currently hipfort expects hip kernels to be written in C++ . As I'm aware, CUDA-Fortran syntax is restricted to only the PGI compilers.
Edit : Compiling cuda-fortran kernels is something that must be implemented by a compiler; hipfort relies on an underlying Fortran compiler and hipcc and/or nvcc to compile kernels in C++ syntax to be executable on the target hardware. Currently, global subroutines for GPU offloading are not part of a Fortran standard.
@gregrodgers , is this part of the roadmap for hipfort ?
@schoonovernumerics is correct, hipfort
only targets the existing HIP API. As far as I am aware, it is not planned to support writing GPU kernels in Fortran in the ROCm ecosystem, they must be written in C++.
Thank you @schoonovernumerics and @pbauman. Write kernels in C++ for GPU ported Fortran code always seems to be a logical choice to me. Unfortunately, I don't know major Fortran software that follow this path when were ported to GPUs. Let's see if something arises to bring old-school scientific code to AMD's GPUs. OpenACC was also a great loss. Thank you guys again.
@maiconfaria
Let's see if something arises to bring old-school scientific code to AMD's GPUs
Let's see... in a month or so
@maiconfaria Here's an example of a "non-major" Fortran code that uses hipfort currently : https://github.com/FluidNumerics/SELF I'm actively working on this application and would be happy to connect with you on getting ported over, if you're continuing to head that route. In fact, there are hackathons this year where we can work together to get you down this road : https://www.oshackathon.org/events/2021-amd-rocm-hackathons
Hey all, I wanted to give this thread a bump. I've been interacting with a few teams over the past month, and this continues to be a question.
Namely, to offload to GPUs in Fortran, can Fortran programmers ever anticipate writing in Fortran syntax, similar to what was done in CUDA Fortran ? Is this something the folks working on the amdflang/amdclang projects would need to undertake to expose this at the compiler level ?
@pbauman @domcharrier @gregrodgers , would you know who to tag at AMD to get something like this going ?
Namely, to offload to GPUs in Fortran, can Fortran programmers ever anticipate writing in Fortran syntax, similar to what was done in CUDA Fortran ?
AFAIK, this is not currently planned. @gregrodgers would know better who to poke. I'm not a compiler developer, but I believe it would be a pretty substantial effort to develop this capability, so it may be difficult to get traction to support native Fortran GPU kernels without some financial backing. Just my two cents (in a currency not worth very much).
Namely, to offload to GPUs in Fortran, can Fortran programmers ever anticipate writing in Fortran syntax, similar to what was done in CUDA Fortran ? Is this something the folks working on the amdflang/amdclang projects would need to undertake to expose this at the compiler level ?
FWIW Right now, you could try gpufort
to extract a HIP kernel from your CUDA Fortran loopnest or kernel; see e.g.:
https://github.com/ROCmSoftwarePlatform/gpufort/blob/main/examples/cudafortran/vector-add/vector-add.f90
@domcharrier thanks for sharing that. I still hear from folks that keeping the CUDA-Fortran syntax is desirable for a number of reasons :
- GPU kernels can be written in Fortran syntax, so long as the
ATTRIBUTES(Global)
prefix is applied to a subroutine definition. This removes the need to maintain two programming languages and additional boiler-plate code necessary to "glue" kernel launches into Fortran. - The
DEVICE
attribute for basic data types in fortran makes it simple to declare data on the GPU and allocate device memory with theALLOCATE
intrinsic - Memory copy between host and device is enabled by overloaded
=
. While this might not seem like much, but it is such a clean syntax to use for handling memcpy's.
While I still find hipfort quite useful, there are a number of groups that were initially doing this kind of ISO_C_BINDING stuff with CUDA before CUDA-Fortran. CUDA-Fortran provides a cleaner implementation with less code and a single language syntax, making research computing projects a lot more manageable.
CUDA-Fortran provides a cleaner implementation with less code and a single language syntax, making research computing projects a lot more manageable
I would also love it if hip went in this direction of supporting Fortran on GPUs in the same style as CUDA Fortran this way. I started out with CUDA C/C++, but went to CUDA Fortran after a few projects because of its simpler syntax and style [plus first-class array treatment and ability to avoid explicit pointer management].
Hello everyone, @domcharrier could you please give an update on the gpufort project ? is it still being developed or discontinued by AMD? It appears that main branch hasn't moved since long time. do you recommend to try some other branch (even with less features supported) ?
Hi @umeshpp ,
It's currently on hold. Part of the reason is that we do not have good data on who is using it. The other part is that third-party compilers (GCC, HPE) made some progress wrt OpenMP and OpenACC.
The develop-acc-no-cptrs
branch is essentially the new main branch when you
are porting OpenACC applications.
Most recent but very unstable work is on feature/parallelism-levels
, where I started to bring semantic analysis and translation-time parameter evaluations to the code generator.
It would probably take ~3 months to get this stable.
Hi Dominic @domcharrier Thanks a lot for this update. it is really helpful. could you please give a status update on the part which ports 'cuda fortran' kernels to 'hip c++' kernels ? and if any, which branch would you suggest to use for this type of porting ? I am porting a scientific application which has a lot of cuda fortran kernels and after some tests with the main branch of 'gpufort' we have decided to port the cuda fortran kernels manually to hip c++ kernels. Would you say this is the way to go for the moment ?
Hi @umeshpp ,
I am porting a scientific application which has a lot of cuda fortran kernels and after some tests with the main branch of 'gpufort' we have decided to port the cuda fortran kernels manually to hip c++ kernels. Would you say this is the way to go for the moment ?
Yes, that's probably the way to go, right now.
In case you want to experiment further:
I introduced some interoperable array types up to dimension 7 on
GPUFORT branch develop-acc-no-cptr
, which might help you:
https://github.com/ROCmSoftwarePlatform/gpufort/tree/develop-acc-no-cptrs/examples/gpufort_array
They are the default data types used for arrays in the generated C++ kernels
on that branch.
This is a good example: https://github.com/ROCmSoftwarePlatform/gpufort/tree/develop-acc-no-cptrs/examples/gpufort_array/vector-add-hipmalloc (Note that the kernel launch routines signature will likely only work fine with GCC on a linux OS).
The arrays have the following properties:
- You can wrap them around Fortran (device) pointers (e.g. created with
hipMalloc
) on the Fortran side. - You can type them (
float
,double
, ...) on the C++ side. - You can use the Fortran index
operator ()
on the C++ side, which makes kernels significantly more readable.
Thanks a lot @domcharrier for the details. I will look into them.