(Closes #2716) Add support for module-inlining calls to polymorphic kernels/routines
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.91%. Comparing base (
e8f0021) to head (00c36fd). Report is 131 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #2732 +/- ##
=======================================
Coverage 99.91% 99.91%
=======================================
Files 364 364
Lines 51697 51781 +84
=======================================
+ Hits 51652 51739 +87
+ Misses 45 42 -3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
As well as addressing the coverage, I need to extend the LFRic integration tests to use this new functionality.
Added the KernelModuleInlineTrans back into the LFRic transformation script and get a crash when trying to add the interface symbol into the symbol table.
Build and run with updated transformation script works:
This doesn't seem very different from the profile of the version before this change so I need to check exactly what PSyclone is making of things.
If you merge with master, I added multiple prints in the output, so you can grep and count the number of each cases that we could not get the kernel_schedule before due to polymorphic kernels, which was 331
The OMP offload LFRic integration test failed with an ICE:
17:47:36 Pre-process and compile inventory/id_r32_field_array_pair_mod.F90
inventory/id_r32_field_array_pair_mod.F90:
NVFORTRAN-S-0000-Internal compiler error. flowgraph: node is zero 3 (lfric_xios_setup_mod_psy.f90: 85)
NVFORTRAN-F-0000-Internal compiler error. Invalid key for hash 0 (lfric_xios_setup_mod_psy.f90: 85)
NVFORTRAN/x86-64 Linux 24.5-1: compilation aborted
It's seems odd that there's a _psy.f90 for xios setup??
EDIT: L85 of that file corresponds to the end of the (module-inlined?) nodal_coordinates_code kernel (subroutine) which is called from within an offloaded region. However, the routine itself does not contain an offload directive?
Do the build manually for OpenACC (which doesn't give an ICE) and things are looking better:
(note: I've marked MATMUL as being available on the GPU now.)
I think @sergisiso has seen that module-inlining a routine is sufficient for the compiler to do the right thing, even if it's not had e.g.
acc routine added to it?
I need to figure out why we get an inlined routine without an offload directive added to it. The build log says:
PSy name = 'lfric_xios_setup_mod_psy'
Transforming invoke 'invoke_0_nodal_xyz_coordinates_kernel_type' ...
Failed to annotate 'nodal_xyz_coordinates_code' with GPU-enabled directive due to:
Transformation Error: Kernel 'nodal_xyz_coordinates_code' accesses the symbol 'chi2xyz: RoutineSymbol<NoType, pure=unknown, elemental=unknown>' which is imported. If this symbol represents data then it must first be converted to a Kernel argument using the KernelImportsToArguments transformation.
Failed to annotate 'nodal_xyz_coordinates_code' with GPU-enabled directive due to:
Transformation Error: Kernel 'nodal_xyz_coordinates_code' accesses the symbol 'chi2xyz: RoutineSymbol<NoType, pure=unknown, elemental=unknown>' which is imported. If this symbol represents data then it must first be converted to a Kernel argument using the KernelImportsToArguments transformation.
Transforming invoke 'invoke_1_nodal_coordinates_kernel_type' ...
Failed to module-inline kernel 'nodal_coordinates_code' due to:
Transformation Error: Cannot inline subroutine 'nodal_coordinates_code' because another, different, subroutine with the same name already exists and versioning of module-inlined subroutines is not implemented yet.
Successfully offloaded loop with ['nodal_coordinates_code']
Successfully offloaded loop with ['nodal_coordinates_code']
Transforming invoke 'invoke_2_nodal_coordinates_kernel_type' ...
Failed to module-inline kernel 'nodal_coordinates_code' due to:
Transformation Error: Cannot inline subroutine 'nodal_coordinates_code' because another, different, subroutine with the same name already exists and versioning of module-inlined subroutines is not implemented yet.
Successfully offloaded loop with ['nodal_coordinates_code']
Successfully offloaded loop with ['nodal_coordinates_code']
Part of the problem was that the optimisation script wasn't checking that it hadn't already transformed a given kernel for a given invoke. Fixing that removes the 'failed to inline' messages but we still end up with an inlined kernel that doesn't have a directive added to it.
The problem was that, having successfully module-inlined the kernel routine, we proceed to apply the annotation transformation to the original Kern and that does not update the Routine that has been inlined. It probably should? For now, I can search for the newly inlined routine and apply the transformation to that and we get the expected code.
Now get a compilation failure:
NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure matrix_vector_code (
algorithm/norm_alg_mod_psy.f90: 338)
This seems to be because we have successfully inlined this interface and its routines but subsequent PSy-layer subroutines are still importing it from a module. In turn, this is because I now check whether or not we've already transformed a kernel of a given name. Essentially, we need multiple LFRicKern objects to all point to the same bit of PSyIR.
Kern.get_kernel_schedule() (which is to be replaced/removed if/when we migrate Kern to subclass Call) currently caches the PSyIR of the kernel. KernelModuleInlineTrans creates a copy of this PSyIR and then inserts it into the PSy-layer. Therefore, get_kernel_schedule() should now return that copy of the PSyIR. In fact, that would happen automatically if I undid my changes to KernelModuleInlineTrans so that it doesn't copy the PSyIR. I can't remember why I made that change.
I think I made that change because, without copying, a routine gets removed from its original Container but that breaks any Interface that refers to it. This becomes a problem in the (unlikely) event that a routine is called directly as well as via an interface.
If we have an Algorithm layer that contains two invokes that each call the same kernel then we attempt to apply KernelModuleInlineTrans to each kernel call. If the first one succeeds then we immediately (in global.py) proceed to add e.g. acc routine to it. That done, we move on to the next invoke where we try to do the same thing. We then find that the body of the routine we want to inline does not match the one we've already inlined because we've added acc routine. This results in:
- The second kernel is not flagged as being module inlined and thus is handled by
_rename_and_write. - The result of
_rename_and_writeis a renamed kernel for which the PSy layer does not contain a symbol.
This shouldn't happen. However, we also want to mark all kernels of the same name as being module-inlined and pointing to a single implementation. Since global.py works invoke by invoke, it's not simple (or natural) to alter it so that it works kernel by kernel.
Thinking about this a bit more, I belatedly realise that if a given PSy layer routine calls the same kernel more than once then either all of them must be module inlined or none of them (unless we attempt to rename the inlined version and that gets complicated). I think the reason that LFRic is still not working is that we fail to inline a second instance of the same kernel (because the first instance has been transformed) but then we do proceed to transform it. Therefore, rename_and_write() is called in order to generate a new version of that kernel on disk. I need a reproducer for this.
I've realised that Kern.module_inline sets this flag for all Kernels with that name in the current InvokeSchedule. This seems sensible but is causing me problems because we end up with a Kernel marked as 'module inlined' but which actually isn't because the transformation on it failed and so it was written to file and renamed instead.
The solution might therefore be as simple as not attempting to transform a Kernel that is already marked as module inlined.
The trouble with the LFRic example is that we have two, separate invoke calls, each with the same Kernel. The first Kernel gets module inlined but has to be modified to have some imports made local to it. That then means we refuse to inline the second Kernel because it is no longer identical. The question is, why does the search for the source of the second kernel not return the module-inlined source of the first one?
With this branch and latest gpu script in lfric/examples:
Adding RANDOM_NUMBER intrinsic to available_on_device - build still worked OK on GPU but profile didn't look much different. Need to update the Sankey diagram and see whether I've missed anything.
One of the compilation tests fail with the nvidia compiler. This same test passes with gfortran:
$ nvfortran -I../dynamo_wrapper0/utilities -I../dynamo_wrapper0/field -I../dynamo_wrapper0/operator -I../dynamo_wrapper0/function_space -I../dynamo_wrapper0/mesh -I../dynamo_wrapper0/scalar -I../dynamo_wrapper0/configuration -c psy.f90
NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mixed_code (psy.f90: 289)
The problem is that the invoke subroutine is still importing the kernel interface, even though it has been module inlined:
SUBROUTINE invoke_1(scalar_r_phys, a, istp)
USE mixed_kernel_mod, ONLY: mixed_code
EDIT: I think this is correct. We've only module-inlined the mixed_kernel for invoke_0, not invoke_1. The latter still uses the imported version of the kernel. Is this compliant with the Fortran standard though?
I'm beginning to think that this is a compiler bug. If I alter the second invoke so that it calls a specific implementation of the kernel (e.g. mixed_code_64) then it compiles fine. I've made a reproducer and sent it to Lukas and Wayne.
This is ready for a first review. It's quite big but can wait until 3.0 is out of the door. Probably needs to be reviewed by @sergisiso.
In experimenting with this branch, I discovered a bug: the routines that a PUBLIC Interface maps to may themselves be declared PRIVATE in their containing module! Unfortunately, in what I've written I've assumed they must be public.
Trying this updated branch with LFRic (and my enhanced global.py) I now get:
Failed to module-inline routine coordinate_jacobian:
Transformation Error: routine 'coordinate_jacobian' contains accesses to 'jacobian_abr2xyz' which is declared in the same module scope. Cannot inline such a routine.
where jacobian_abr2xyz is actually a routine in the same module so we could inline it...
Also get
File '../lfric/components/science/source/kernel/geometry/chi_transform_mod.F90' does contain module 'chi_transform_mod' but PSyclone is unable to create the PSyIR of it.
This is possibly because that module contains an unsupported form of interface definitions:
INTERFACE
SUBROUTINE chi2xyz_interface(chi_1, chi_2, chi_3, panel_id, x, y, z)
IMPORT :: r_def, i_def
IMPLICIT NONE
INTEGER(KIND = i_def), INTENT(IN) :: panel_id
REAL(KIND = r_def), INTENT(IN) :: chi_1, chi_2, chi_3
REAL(KIND = r_def), INTENT(OUT) :: x, y, z
END SUBROUTINE chi2xyz_interface
END INTERFACE
Attempt to do recursive inlining:
Failed to inline call jacobian_abr2xyz(alpha, beta, radius, panel_id):
Transformation Error: Routine 'jacobian_abr2XYZ_r_single' cannot be inlined because it accesses variable 'r_single' from its parent container.
but r_single is actually named in an import into the parent container => should be easy to handle.
...well, ish. It's a bit messy because there's no clean way to spot all the Symbols a bit of code uses and I don't want to duplicate what validate() does. Maybe we can use VAI?
Got a crash while processing init_saturated_profile_alg_mod.x90:
File "/home//Projects/PSyclone/src/psyclone/domain/lfric/lfric_loop.py", line 895, in gen_code
super().gen_code(parent)
~~~~~~~~~~~~~~~~^^^^^^^^
File "/home//Projects/PSyclone/src/psyclone/psyir/nodes/loop.py", line 581, in gen_code
kernel.rename_and_write()
~~~~~~~~~~~~~~~~~~~~~~~^^
File "/home//Projects/PSyclone/src/psyclone/psyGen.py", line 1664, in rename_and_write
self._rename_psyir(new_suffix)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "/home//Projects/PSyclone/src/psyclone/psyGen.py", line 1741, in _rename_psyir
kern_schedule.name = new_kern_name[:]
^^^^^^^^^^^^^^^^^^
File "/home//Projects/PSyclone/src/psyclone/psyir/nodes/routine.py", line 419, in name
sym = self.symbol_table.lookup(symbol.name)
because the Routine instance doesn't yet have a symbol table.
Fixed that and now get failure when doing:
LFRIC_OFFLOAD_DIRECTIVES=omp psyclone -api lfric -d /Projects/LFRic/spack_gfortran/lfric_apps_7536/applications/gungho_model/working/build_gungho_model -s applications/gungho_model/optimisation/psyclone-test/global.py /Projects/LFRic/spack_gfortran/lfric_apps_7536/applications/gungho_model/working/build_gungho_model/algorithm/initialisation/init_saturated_profile_alg_mod.x90
because the kernel schedule obtained in CodedKern._rename_and_write() has somehow been detached from its parent container.
I suspect the problem is that we have module-inlined the kernel and broken something along the way.
Module-inlined kernel 'initial_theta_code'
Failed to annotate 'initial_theta_code' with GPU-enabled directive due to:
Transformation Error: Kernel 'initial_theta_code' accesses the symbol 'analytic_temperature: RoutineSymbol<UnresolvedType, pure=unknown, elemental=unknown>' which is imported. If this symbol represents data then it must first be converted to a Kernel argument using the KernelImportsToArguments transformation.
Further investigation shows that it is when generating the code for invoke_5 that we get the failure. When processing this invoke I see:
Transforming invoke 'invoke_5' ...
setting _parent None
setting _parent None
setting _parent None
setting _parent Container[init_saturated_profile_alg_mod_psy]
setting _parent None
setting _parent Container[init_saturated_profile_alg_mod_psy]
Failed to module-inline kernel 'theta_e_code' due to:
Transformation Error: Kernel 'theta_e_code' cannot be module inlined into Container 'init_saturated_profile_alg_mod_psy' because a *different* routine with that name already exists and versioning of module-inlined subroutines is not implemented yet.
Annotated kernel 'theta_e_code'