[discussion] Backend-specific interfaces
Background
HIP and CUDA Fortran interfaces need to be bound to different symbols (cuda/cu... for CUDA and roc/hip... for HIP) (see hipMalloc example in 1.1).
Some datatypes differ for the backends, e.g. CUDA has integer-type stream handles
while a stream is a datastructure in HIP, which we model as type(c_ptr) in the Fortran interface.
Discussion
We can tackle this with two different approaches; see below.
The hip.f module in the repository currently follows approach 1.
I would prefer approach 2 because it directly
mirrors the structure of the header files in the /opt/rocm/include/hip folder,
where we have hcc_detail and nvcc_detail headers.
Moreover, it is easier to account for differences in the function argument
types.
(As we will automatically generate all interfaces, we do not expect that one approach requires more time to realize than the other.)
1. Single module with ifdefs per function signature:
module hip
#ifdef USE_CUDA_NAMES
function hipMalloc(ptr,sizeBytes) bind(c, name="cudaMalloc")
#else
function hipMalloc(ptr,sizeBytes) bind(c, name="hipMalloc")
#endif
use iso_c_binding
use hip_enums
implicit none
integer(kind(hipSuccess)) :: hipMalloc
type(c_ptr) :: ptr
integer(c_size_t), value :: sizeBytes
end function hipMalloc
! ...
! ...
! ...
end module hip
2. Backend-specific modules (hcc_detail vs nvcc_detail) as in the HIP headers:
module hip
#ifdef USE_CUDA_NAMES
use hip_nvcc
#else
use hip_hcc
#endif
end module hip
module hip_nvcc
function hipMalloc(ptr,sizeBytes) bind(c, name="cudaMalloc")
! ...
! ...
! ...
end module hip_nvcc
Please vote for one approach in the comment section.
Like if you prefer approach 1.
Like if you prefer approach 2.
This is a great question. Neither option is more difficult for the programmer. They just "use hip". It would seem that the first option opens up more chance of divergence in capability which could limit portability.
Right now there really is no wrapper function, because the few HIP apis we have are just name changes. The architecture of hipfort allows compiled logic if necessary to emulate the hip function. As we get more hip API coverage, I expect to see more complexity in the differences. As such it would be good to keep the functional components together. For example, implementing hip with level 0 may look entirely different. So I am leaning to option 2 but easily swayed.
USE_CUDA_NAMES is probably a bad macro name choice because it is binary and hip may be desired on other GPUs.
Sorry, option 1 is the one that would keep the functional items together. That would be my preference as differences become more complex.
I think we should open up a branch with option 2 and build community experience with both options and gain feedback from end users before committing to a resolution of this issue. As a contributor to this repository and as a hipfort user, I would prefer option 1 so that I don't have to add CPP flag option complexity to my application's build systems.
This being said, more input from hipfort users should be taken into consideration to help resolve this issue.