PSyclone
PSyclone copied to clipboard
Towards #924 - support module inline in language-level PSyIR.
- Extends
CodeBlock.get_symbol_names()so that it excludes statement labels; - Extends
Container.get_routine_psyir()to handle searching for an implementation; - Replace
InlineTrans._find_routine()withCall.get_callees(); - Extends
ContainerSymbol.container()to check the local scope first; - Extends
Symbol.resolve_type()to handle RoutineSymbols; - Extends
SymbolTableto permit removal of a Symbol provided it is also present in an outer scope; - Finally, extensive changes to
KernelModuleInlineTransin order to make it general.
Codecov Report
Attention: 1 lines in your changes are missing coverage. Please review.
Comparison is base (
77b28ff) 99.85% compared to head (867fe2b) 99.84%.
| Files | Patch % | Lines |
|---|---|---|
| src/psyclone/psyir/symbols/containersymbol.py | 95.45% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #2413 +/- ##
==========================================
- Coverage 99.85% 99.84% -0.02%
==========================================
Files 352 352
Lines 47337 47464 +127
==========================================
+ Hits 47268 47389 +121
- Misses 69 75 +6
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Realised that I currently can't handle a routine that is called via a generic interface. We create a RoutineSymbol for that interface but it is of UnknownFortranType. Without inspecting and reasoning about the type information within that type, there's nothing I can do and doing that is effectively implementing something like a RoutineInterfaceSymbol which is a different issue altogether. I'll put in a x-failing test for now.
I think I have this all working and tested now. I'm going to try extending the integration test to actually make use of it...
Of course, NEMO rarely uses USE xxx, ONLY: yyy so generally this still isn't going to work. I tried manually adding an ONLY in lbclnk.f90 but that then hit the interface problem:
Module inlining of 'lbc_nfd' failed: Transformation Error: KernelModuleInlineTrans failed to retrieve PSyIR for routine 'lbc_nfd' due to: RoutineSymbol 'lbc_nfd' exists in Container 'lbcnfd' but is of UnknownFortranType:
interface lbc_nfd
module procedure lbc_nfd_2d, lbc_nfd_3d, lbc_nfd_4d
module procedure lbc_nfd_2d_ptr, lbc_nfd_3d_ptr, lbc_nfd_4d_ptr
module procedure lbc_nfd_2d_ext
end interface
Cannot currently module inline such a routine.
I also found a bug that was exercised when attempting to module-inline a routine that was already present in the module. I've added a separate validation check for this.
I've created #2422 for the 'interface problem'.
Ready for a first look now.
This PR will also help us to handle LFRic kernels that call other routines.
I've been bold and added a new inline_calls method to examples/nemo/scripts/utils.py that attempts to inline everything. I've added a call to this from the acc_kernels_trans.py script so we'll see what the integration tests make of it...
As usual, actually trying things for real has revealed a couple of problems - one to do with generic interfaces and another with the psyclone_internal_cmp routine that we introduce in some situations. I think I've fixed both of those now and have relaunched the integration tests.
Integration tests failed but I can't reproduce it interactively. Therefore, I've fired them off again but requested that the failing build be done in serial to help me see what's going on...
Issue was that a local routine that we were trying to inline was in a CodeBlock and thus not found. I was raising an InternalError in this case (because I thought it should never happen) but now I realise it can legitimately occur I've changed this to a TransformationError.
Finally have integration tests working again. Looking at test coverage I now realise that I have duplication between InlineTrans._find_routine() and the new (in this PR) Call.get_callees().
Most of the duplication is gone now. I've started trying to integrate with the ModuleManager but have hit the issue that it's currently LFRic specific. I confess that I reviewed it originally but I now think that it should be general purpose and sub-classed for LFRic. The general solution should (I think) parse every Fortran source file so as to robustly identify the modules that it contains. The LFRic sub-class could optimise this out as the coding standards mean we know that there is only one module per file and that its name is essentially given by that of the file.
Integration tests failed as it turns out that the Fortran frontend wasn't passing the include paths when creating a new parser instance. If I fix that then processing a real NEMO file is incredibly slow, probably because we now attempt to resolve all wildcard imports and that means a lot of parsing!
Most of the duplication is gone now. I've started trying to integrate with the
ModuleManagerbut have hit the issue that it's currently LFRic specific.
Dang. I tried to keep it general, but reading the rest I realise what you mean :(
I confess that I reviewed it originally but I now think that it should be general purpose and sub-classed for LFRic. The general solution should (I think) parse every Fortran source file so as to robustly identify the modules that it contains. The LFRic sub-class could optimise this out as the coding standards mean we know that there is only one module per file and that its name is essentially given by that of the file.
So it's just to determining which modules are contained in which file name? I have moved this into a separate function, which could be overwritten by domain-specific implementation (or a config setting, or command line option??). That function was designed to return a list of module names, so it's independent of the LFRic coding style (though atm it only returns a one-element list). This function could in theory parse the files to return the actual full list, though that would be prohibitively expensive :( We could try to do a quick regular expression search in the files, not fully parsing them? Please open a ticket and let me know exactly what the problem is (i.e. do you need a quick parsing of a lot of files, or implement a different rule, and if so, which one).
It turns out that the OpenMPI header files include quite a few EXTERNAL statements. If the header is INCLUDEd at the top level of a MODULE this results in the whole module being put in a CodeBlock and then we can't find routines to inline. We report that we can't find the routine but that is confusing as it is clearly in the source that we say we've checked - we need a better message.
I've fired off the integration tests for this. Note that it currently still includes the regex workaround/extension to the ModuleManager that I think @hiker was going to revisit?
Integration tests (processing NEMO) fail because of:
"psyclone/parse/module_manager.py", line 219, in get_modules_in_file
source_code = file_in.read()
^^^^^^^^^^^^^^
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 5446: invalid continuation byte
I need to look at what we do in fparser to workaround this problem.
I've fired off the integration tests for this. Note that it currently still includes the regex workaround/extension to the ModuleManager that I think @hiker was going to revisit?
Yes, I am - but note that I have around 5 PRs that would need to go in first. Looking at them, maybe not all of them. but at least #2296 and #1990 (since they both likely touch the module manager, and I want to avoid getting more conflicts in this sequence of PRs, which has caused me grief in the past)
Am getting test compilation errors because (in the kernel-extraction driver tests) I think we're getting the wrong module definition inserted into the generated code - it has an empty testkern_code subroutine with no arguments. The problem here is that the Fortran files contained in the tests/test_files directories do not define a single application and therefore sometimes define entities with the same name, eg.:
(psyclone310) mr@laptop tests$ grep -rl testkern_code test_files/dynamo0p3/
test_files/dynamo0p3/tl_testkern_mod.F90
test_files/dynamo0p3/testkern_short_name_mod.f90
test_files/dynamo0p3/testkern_mod.F90
test_files/dynamo0p3/testkern_no_datatype_mod.f90
test_files/dynamo0p3/testkern_wrong_mod_name.F90
test_files/dynamo0p3/testkern_invalid_fortran_mod.f90
Possiby the simplest solution is to change these so that they use distinct names?
Although that will (probably) work for our examples, it could also be a problem in real code that e.g. has a module containing stubs of MPI-dependent routines so that it can be compiled with or without MPI. e.g. dl_esm_inf has such modules in finite_difference/src/parallel.
It turned out I only needed to rename one module in one test source file to fix that problem so I've done that. To fix the problem in a real application it may be necessary to collect the source that is going to be compiled into some 'build' directory first?
Am trying to apply this to Clem's refactored sea-ice branch of NEMO. It's revealing quite a few problems. Currently I'm hitting problems merging symbol tables when doing inlining. I've also had to turn off recursively examining all wildcard imports as, in NEMO (where almost everything is a wildcard import), that's a bad idea!
If a routine accesses a variable brought into module scope by a wildcard import, then attempting to module inline a second routine that does the same thing (for the same variable) ends up with the first variable being renamed!
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.86%. Comparing base (
d3f9715) to head (699cd28).
Additional details and impacted files
@@ Coverage Diff @@
## master #2413 +/- ##
==========================================
- Coverage 99.86% 99.86% -0.01%
==========================================
Files 353 353
Lines 48821 48954 +133
==========================================
+ Hits 48756 48886 +130
- Misses 65 68 +3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Can I break-out my changes to ModuleManager and Call into separate PRs as this is currently rather large?
Can I break-out my changes to
ModuleManagerandCallinto separate PRs as this is currently rather large?
I'm strongly in favour of this - allows me to easily test with kernel extraction.
NEMO integration tests fail with:
Inlining of 'ctl_warn' failed:
Transformation Error: Routine 'ctl_warn' contains one or more CodeBlocks and therefore cannot be inlined.
Inlining of 'dia_hsb_rst' failed:
Transformation Error: Routine 'dia_hsb_rst' contains one or more CodeBlocks and therefore cannot be inlined.
Adding profiling to non-OpenACC regions in dia_hsb_init
Generation Error: generator: specified PSyclone transformation module 'acc_kernels_trans'
raised the following exception during execution...
{
Invokes found:
dia_obs_init
dia_obs
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/acc_kernels_trans.py", line 431, in trans
inline_calls(sched)
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/utils.py", line 166, in inline_calls
inline_trans.apply(call)
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/inline_trans.py", line 136, in apply
self.validate(node, options)
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/inline_trans.py", line 655, in validate
routine = node.get_callees()[0]
^^^^^^^^^^^^^^^^^^
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/nodes/call.py", line 504, in get_callees
for name in container.resolve_routine(rsym.name):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gh_Inlining of 'ctl_warn' failed:
Transformation Error: Routine 'ctl_warn' contains one or more CodeBlocks and therefore cannot be inlined.
Inlining of 'dia_hsb_rst' failed:
Transformation Error: Routine 'dia_hsb_rst' contains one or more CodeBlocks and therefore cannot be inlined.
Adding profiling to non-OpenACC regions in dia_hsb_init
Generation Error: generator: specified PSyclone transformation module 'acc_kernels_trans'
raised the following exception during execution...
{
Invokes found:
dia_obs_init
dia_obs
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/acc_kernels_trans.py", line 431, in trans
inline_calls(sched)
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/utils.py", line 166, in inline_calls
inline_trans.apply(call)
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/inline_trans.py", line 136, in apply
self.validate(node, options)
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/inline_trans.py", line 655, in validate
routine = node.get_callees()[0]
^^^^^^^^^^^^^^^^^^
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/nodes/call.py", line 504, in get_callees
for name in container.resolve_routine(rsym.name):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/nodes/container.py", line 279, in resolve_routine
raise TypeError(
TypeError: Expected 'ctl_warn' to correspond to either a RoutineSymbol or a GenericInterfaceSymbol but found 'Symbol'
}
please check your scriptrunner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/nodes/container.py", line 279, in resolve_routine
raise TypeError(
TypeError: Expected 'ctl_warn' to correspond to either a RoutineSymbol or a GenericInterfaceSymbol but found 'Symbol'
}
please check your script
Performance is now back to the level it is on master (no faster unfortunately). However, this has required that I turn off the automatic addition of profiling calipers to non-OpenACC regions in the acc_kernels_trans.py script. I'm not sure what the solution to this is. If at any point we finish the work to extend PSyclone to retain any directives in existing code then we could hit this problem again as there may be directives inside a routine that we wish to inline. The logical thing to do at that point would be to strip-out the directives from the inlined version on the basis that they only applied when it was a standalone routine. We could do the same thing with profiling calls.
Performance is now back to the level it is on master (no faster unfortunately). However, this has required that I turn off the automatic addition of profiling calipers to non-OpenACC regions in the
acc_kernels_trans.pyscript.
With that you meant the automatically applied psydata-based profiling regions (using --profile kernels/invokes)? It is strange that this would cause a huge overhead (maybe verify that they are indeed correctly applied outside of the loops? Maybe something has changed?). I am aware that this is NEMO, but still if profiling has a huge overhead, it's worth checking that the transformations did not somehow moved profiling into a loop.
With that you meant the automatically applied psydata-based profiling regions (using
--profile kernels/invokes)? It is strange that this would cause a huge overhead (maybe verify that they are indeed correctly applied outside of the loops? Maybe something has changed?). I am aware that this is NEMO, but still if profiling has a huge overhead, it's worth checking that the transformations did not somehow moved profiling into a loop.
No, it's all OK :-) First off, this is the script itself adding profiling around anything that it hasn't managed to enclose in OpenACC. Second, the problem is simpler than you fear - it's simply (I think) that we ended up with less OpenACC coverage when I turned on inlining because sometimes, the routines we want to inline have already been processed by PSyclone and may therefore have had calls to the profiling API added. I think those calls are then preventing us from adding OpenACC around the region (into which the routine has been inlined). Now that I write that, I think I'll need to dig deeper as, if we'd already added profiling calls then that would have been because we'd previously found statements that we couldn't put in an OpenACC region anyway.