PSyclone icon indicating copy to clipboard operation
PSyclone copied to clipboard

[nemo] Array-valued function needs to be inlined for OpenACC performance

Open arporter opened this issue 3 years ago • 15 comments

For example: sbcblk.f90, cp_air() function.

arporter avatar Oct 02 '20 10:10 arporter

e.g. if we have some code:

my_array(:,:) = cp_air(array1(:,:)) * array2(:,:) + array3(:,:)

where cp_air is an array-valued function that performs compute then this is hard to parallelise/ensure it remains GPU resident. (If we put ACC KERNELS inside cp_air then we can't have it around this assignment.) Ideally we need to refactor this:

my_tmp(:,:) = cp_air(array1(:,:))
my_array(:,:) = my_tmp(:,:)*array2(:,:) + array3(:,:)

That then allows us to have KERNELS inside cp_air() and, separately, around the assignment.

Better still would be to in-line cp_air but we have to walk before we can run.

arporter avatar Oct 02 '20 10:10 arporter

If I remember correctly, some of the existing kernel transformations (that replace intrinsics) have logic similar to the above. Perhaps the splitting out of an intrinsic/function should have its own transformation which can then be shared.

rupertford avatar Oct 02 '20 10:10 rupertford

Yes, I said as much to Chris when we were discussing this. Hopefully there's some reuse possible.

arporter avatar Oct 02 '20 11:10 arporter

Obviously, a step 0 for this work is to be able to identify array-valued functions. Currently, all RoutineSymbols that represent Fortran functions are given DeferredType (#1294).

arporter avatar Jun 22 '21 13:06 arporter

This is going quite well. I need to handle the case where there's a name clash for a ContainerSymbol as we can't simply rename it in that case.

arporter avatar Jul 05 '22 16:07 arporter

Note to self: I need to identify when a routine accesses variables made available from some outer scoping region as this prevents inlining.

arporter avatar Jul 11 '22 15:07 arporter

The first review and subsequent work on validate has thrown up a number of things that I'm adding TODOs for this time around:

  • [ ] validate will be much nicer once we can query the datatype of any Reference (#1799)
  • [ ] Range needs extending to return the PSyIR for the number of elements it represents
  • [ ] Support needs to be added for routines with named arguments.

arporter avatar Aug 18 '22 07:08 arporter

The initial implementation of InlineTransformation is now on master, but some remaining TODOs need to be fixed to close this issue.

sergisiso avatar Sep 01 '22 13:09 sergisiso

This is also one of the major issues in OpenMP for GPU now, mostly in the sbcblk region. Some functions that will need to be inlined are: q_sat, sbc_dcy, gamma_moist, cd_neutral_10m, psi_h, psi_m

sergisiso avatar Mar 08 '23 19:03 sergisiso

Although the basic inlining functionality is on master, it still requires that the routine to be inlined be in the same Container as the call site. That means we need module inlining working in native PSyIR (currently I think it only works for PSyKAl APIs).

arporter avatar Jun 15 '23 10:06 arporter

Alternatively, now that we have the elemental attribute, we can a resolve the cp_air symbol (which could be imported) for: my_array(:,:) = cp_air(array1(:,:)) * array2(:,:) + array3(:,:) and then arrayrange2loop can decide how to convert it into loop-form depending if it is elemental or not.

(Ideally PSyclone should be able to do both, and the transformation script just chose how to approach it)

sergisiso avatar Jun 15 '23 12:06 sergisiso

Currently a CodedKern has a get_kernel_schedule method. To generalise the KernelModuleInlineTrans transformation we need to support Call as well. Call has routine which gives the corresponding RoutineSymbol. In the work I'm doing now I propose to add a RoutineSymbol.get_schedule method which is very like the ContainerSymbol.container method. I could name it schedule but, since it potentially does a lot of searching, I prefer the get_ prefix. I'd argue that the container method could also do with a get_ prefix.

arporter avatar Nov 09 '23 12:11 arporter

I agree with the get_ but maybe it could be Routine.Symbol.get_routine() instead of "schedule" to make it easier for people not familiar with the PSyIR concepts.

(PS: Now I would also rename Container to Module for the same reason, and I think I am to blame for picking the wrong name)

sergisiso avatar Nov 09 '23 17:11 sergisiso

(PS: Now I would also rename Container to Module for the same reason, and I think I am to blame for picking the wrong name)

Well, we do aspire to be language-neutral so I think Container is fine. At least 'routine' is a generic concept :-)

arporter avatar Nov 23 '23 13:11 arporter

Chatting with Martin and Hugo, it turns out that CROCO doesn't have modules but they would still like to be able to do inlining. This would mean we would need to insert routines into the FileContainer instead (or, we just extend InlineTrans to search for the Routine outside of the current Container?).

arporter avatar Jan 29 '24 10:01 arporter