PSyclone icon indicating copy to clipboard operation
PSyclone copied to clipboard

Towards #924 - support module inline in language-level PSyIR.

Open arporter opened this issue 2 years ago • 30 comments

  • Extends CodeBlock.get_symbol_names() so that it excludes statement labels;
  • Extends Container.get_routine_psyir() to handle searching for an implementation;
  • Replace InlineTrans._find_routine() with Call.get_callees();
  • Extends ContainerSymbol.container() to check the local scope first;
  • Extends Symbol.resolve_type() to handle RoutineSymbols;
  • Extends SymbolTable to permit removal of a Symbol provided it is also present in an outer scope;
  • Finally, extensive changes to KernelModuleInlineTrans in order to make it general.

arporter avatar Nov 23 '23 16:11 arporter

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (77b28ff) 99.85% compared to head (867fe2b) 99.84%.

Files Patch % Lines
src/psyclone/psyir/symbols/containersymbol.py 95.45% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2413      +/-   ##
==========================================
- Coverage   99.85%   99.84%   -0.02%     
==========================================
  Files         352      352              
  Lines       47337    47464     +127     
==========================================
+ Hits        47268    47389     +121     
- Misses         69       75       +6     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Nov 23 '23 16:11 codecov[bot]

Realised that I currently can't handle a routine that is called via a generic interface. We create a RoutineSymbol for that interface but it is of UnknownFortranType. Without inspecting and reasoning about the type information within that type, there's nothing I can do and doing that is effectively implementing something like a RoutineInterfaceSymbol which is a different issue altogether. I'll put in a x-failing test for now.

arporter avatar Nov 24 '23 08:11 arporter

I think I have this all working and tested now. I'm going to try extending the integration test to actually make use of it...

arporter avatar Nov 24 '23 12:11 arporter

Of course, NEMO rarely uses USE xxx, ONLY: yyy so generally this still isn't going to work. I tried manually adding an ONLY in lbclnk.f90 but that then hit the interface problem:

Module inlining of 'lbc_nfd' failed: Transformation Error: KernelModuleInlineTrans failed to retrieve PSyIR for routine 'lbc_nfd' due to: RoutineSymbol 'lbc_nfd' exists in Container 'lbcnfd' but is of UnknownFortranType:
interface lbc_nfd
  module procedure lbc_nfd_2d, lbc_nfd_3d, lbc_nfd_4d
  module procedure lbc_nfd_2d_ptr, lbc_nfd_3d_ptr, lbc_nfd_4d_ptr
  module procedure lbc_nfd_2d_ext
end interface
Cannot currently module inline such a routine.

I also found a bug that was exercised when attempting to module-inline a routine that was already present in the module. I've added a separate validation check for this.

arporter avatar Nov 24 '23 15:11 arporter

I've created #2422 for the 'interface problem'.

arporter avatar Nov 27 '23 17:11 arporter

Ready for a first look now.

arporter avatar Nov 27 '23 17:11 arporter

This PR will also help us to handle LFRic kernels that call other routines.

arporter avatar Dec 12 '23 15:12 arporter

I've been bold and added a new inline_calls method to examples/nemo/scripts/utils.py that attempts to inline everything. I've added a call to this from the acc_kernels_trans.py script so we'll see what the integration tests make of it...

arporter avatar Dec 21 '23 14:12 arporter

As usual, actually trying things for real has revealed a couple of problems - one to do with generic interfaces and another with the psyclone_internal_cmp routine that we introduce in some situations. I think I've fixed both of those now and have relaunched the integration tests.

arporter avatar Dec 21 '23 15:12 arporter

Integration tests failed but I can't reproduce it interactively. Therefore, I've fired them off again but requested that the failing build be done in serial to help me see what's going on...

arporter avatar Jan 04 '24 12:01 arporter

Issue was that a local routine that we were trying to inline was in a CodeBlock and thus not found. I was raising an InternalError in this case (because I thought it should never happen) but now I realise it can legitimately occur I've changed this to a TransformationError.

arporter avatar Jan 05 '24 10:01 arporter

Finally have integration tests working again. Looking at test coverage I now realise that I have duplication between InlineTrans._find_routine() and the new (in this PR) Call.get_callees().

arporter avatar Jan 05 '24 16:01 arporter

Most of the duplication is gone now. I've started trying to integrate with the ModuleManager but have hit the issue that it's currently LFRic specific. I confess that I reviewed it originally but I now think that it should be general purpose and sub-classed for LFRic. The general solution should (I think) parse every Fortran source file so as to robustly identify the modules that it contains. The LFRic sub-class could optimise this out as the coding standards mean we know that there is only one module per file and that its name is essentially given by that of the file.

arporter avatar Jan 08 '24 17:01 arporter

Integration tests failed as it turns out that the Fortran frontend wasn't passing the include paths when creating a new parser instance. If I fix that then processing a real NEMO file is incredibly slow, probably because we now attempt to resolve all wildcard imports and that means a lot of parsing!

arporter avatar Jan 09 '24 11:01 arporter

Most of the duplication is gone now. I've started trying to integrate with the ModuleManager but have hit the issue that it's currently LFRic specific.

Dang. I tried to keep it general, but reading the rest I realise what you mean :(

I confess that I reviewed it originally but I now think that it should be general purpose and sub-classed for LFRic. The general solution should (I think) parse every Fortran source file so as to robustly identify the modules that it contains. The LFRic sub-class could optimise this out as the coding standards mean we know that there is only one module per file and that its name is essentially given by that of the file.

So it's just to determining which modules are contained in which file name? I have moved this into a separate function, which could be overwritten by domain-specific implementation (or a config setting, or command line option??). That function was designed to return a list of module names, so it's independent of the LFRic coding style (though atm it only returns a one-element list). This function could in theory parse the files to return the actual full list, though that would be prohibitively expensive :( We could try to do a quick regular expression search in the files, not fully parsing them? Please open a ticket and let me know exactly what the problem is (i.e. do you need a quick parsing of a lot of files, or implement a different rule, and if so, which one).

hiker avatar Jan 09 '24 12:01 hiker

It turns out that the OpenMPI header files include quite a few EXTERNAL statements. If the header is INCLUDEd at the top level of a MODULE this results in the whole module being put in a CodeBlock and then we can't find routines to inline. We report that we can't find the routine but that is confusing as it is clearly in the source that we say we've checked - we need a better message.

arporter avatar Jan 09 '24 15:01 arporter

I've fired off the integration tests for this. Note that it currently still includes the regex workaround/extension to the ModuleManager that I think @hiker was going to revisit?

arporter avatar Jan 24 '24 17:01 arporter

Integration tests (processing NEMO) fail because of:

"psyclone/parse/module_manager.py", line 219, in get_modules_in_file
source_code = file_in.read()
              ^^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 5446: invalid continuation byte

I need to look at what we do in fparser to workaround this problem.

arporter avatar Jan 25 '24 09:01 arporter

I've fired off the integration tests for this. Note that it currently still includes the regex workaround/extension to the ModuleManager that I think @hiker was going to revisit?

Yes, I am - but note that I have around 5 PRs that would need to go in first. Looking at them, maybe not all of them. but at least #2296 and #1990 (since they both likely touch the module manager, and I want to avoid getting more conflicts in this sequence of PRs, which has caused me grief in the past)

hiker avatar Jan 25 '24 10:01 hiker

Am getting test compilation errors because (in the kernel-extraction driver tests) I think we're getting the wrong module definition inserted into the generated code - it has an empty testkern_code subroutine with no arguments. The problem here is that the Fortran files contained in the tests/test_files directories do not define a single application and therefore sometimes define entities with the same name, eg.:

(psyclone310) mr@laptop tests$ grep -rl testkern_code test_files/dynamo0p3/
test_files/dynamo0p3/tl_testkern_mod.F90
test_files/dynamo0p3/testkern_short_name_mod.f90
test_files/dynamo0p3/testkern_mod.F90
test_files/dynamo0p3/testkern_no_datatype_mod.f90
test_files/dynamo0p3/testkern_wrong_mod_name.F90
test_files/dynamo0p3/testkern_invalid_fortran_mod.f90

Possiby the simplest solution is to change these so that they use distinct names?

Although that will (probably) work for our examples, it could also be a problem in real code that e.g. has a module containing stubs of MPI-dependent routines so that it can be compiled with or without MPI. e.g. dl_esm_inf has such modules in finite_difference/src/parallel.

arporter avatar Jan 25 '24 22:01 arporter

It turned out I only needed to rename one module in one test source file to fix that problem so I've done that. To fix the problem in a real application it may be necessary to collect the source that is going to be compiled into some 'build' directory first?

arporter avatar Jan 26 '24 10:01 arporter

Am trying to apply this to Clem's refactored sea-ice branch of NEMO. It's revealing quite a few problems. Currently I'm hitting problems merging symbol tables when doing inlining. I've also had to turn off recursively examining all wildcard imports as, in NEMO (where almost everything is a wildcard import), that's a bad idea!

arporter avatar Feb 02 '24 17:02 arporter

If a routine accesses a variable brought into module scope by a wildcard import, then attempting to module inline a second routine that does the same thing (for the same variable) ends up with the first variable being renamed!

arporter avatar Feb 06 '24 14:02 arporter

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.86%. Comparing base (d3f9715) to head (699cd28).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2413      +/-   ##
==========================================
- Coverage   99.86%   99.86%   -0.01%     
==========================================
  Files         353      353              
  Lines       48821    48954     +133     
==========================================
+ Hits        48756    48886     +130     
- Misses         65       68       +3     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Feb 06 '24 14:02 codecov-commenter

Can I break-out my changes to ModuleManager and Call into separate PRs as this is currently rather large?

arporter avatar Apr 29 '24 15:04 arporter

Can I break-out my changes to ModuleManager and Call into separate PRs as this is currently rather large?

I'm strongly in favour of this - allows me to easily test with kernel extraction.

hiker avatar May 02 '24 03:05 hiker

NEMO integration tests fail with:

Inlining of 'ctl_warn' failed:
Transformation Error: Routine 'ctl_warn' contains one or more CodeBlocks and therefore cannot be inlined.
Inlining of 'dia_hsb_rst' failed:
Transformation Error: Routine 'dia_hsb_rst' contains one or more CodeBlocks and therefore cannot be inlined.
Adding profiling to non-OpenACC regions in dia_hsb_init
Generation Error: generator: specified PSyclone transformation module 'acc_kernels_trans'
raised the following exception during execution...
{
Invokes found:
dia_obs_init
dia_obs
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/acc_kernels_trans.py", line 431, in trans
    inline_calls(sched)
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/utils.py", line 166, in inline_calls
    inline_trans.apply(call)
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/inline_trans.py", line 136, in apply
    self.validate(node, options)
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/inline_trans.py", line 655, in validate
    routine = node.get_callees()[0]
              ^^^^^^^^^^^^^^^^^^
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/nodes/call.py", line 504, in get_callees
    for name in container.resolve_routine(rsym.name):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/gh_Inlining of 'ctl_warn' failed:
Transformation Error: Routine 'ctl_warn' contains one or more CodeBlocks and therefore cannot be inlined.
Inlining of 'dia_hsb_rst' failed:
Transformation Error: Routine 'dia_hsb_rst' contains one or more CodeBlocks and therefore cannot be inlined.
Adding profiling to non-OpenACC regions in dia_hsb_init
Generation Error: generator: specified PSyclone transformation module 'acc_kernels_trans'
raised the following exception during execution...
{
Invokes found:
dia_obs_init
dia_obs
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/acc_kernels_trans.py", line 431, in trans
    inline_calls(sched)
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/utils.py", line 166, in inline_calls
    inline_trans.apply(call)
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/inline_trans.py", line 136, in apply
    self.validate(node, options)
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/inline_trans.py", line 655, in validate
    routine = node.get_callees()[0]
              ^^^^^^^^^^^^^^^^^^
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/nodes/call.py", line 504, in get_callees
    for name in container.resolve_routine(rsym.name):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/nodes/container.py", line 279, in resolve_routine
    raise TypeError(
    TypeError: Expected 'ctl_warn' to correspond to either a RoutineSymbol or a GenericInterfaceSymbol but found 'Symbol'
}
please check your scriptrunner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/nodes/container.py", line 279, in resolve_routine
    raise TypeError(
    TypeError: Expected 'ctl_warn' to correspond to either a RoutineSymbol or a GenericInterfaceSymbol but found 'Symbol'
}
please check your script

arporter avatar Jun 05 '24 16:06 arporter

Performance is now back to the level it is on master (no faster unfortunately). However, this has required that I turn off the automatic addition of profiling calipers to non-OpenACC regions in the acc_kernels_trans.py script. I'm not sure what the solution to this is. If at any point we finish the work to extend PSyclone to retain any directives in existing code then we could hit this problem again as there may be directives inside a routine that we wish to inline. The logical thing to do at that point would be to strip-out the directives from the inlined version on the basis that they only applied when it was a standalone routine. We could do the same thing with profiling calls.

arporter avatar Jun 14 '24 07:06 arporter

Performance is now back to the level it is on master (no faster unfortunately). However, this has required that I turn off the automatic addition of profiling calipers to non-OpenACC regions in the acc_kernels_trans.py script.

With that you meant the automatically applied psydata-based profiling regions (using --profile kernels/invokes)? It is strange that this would cause a huge overhead (maybe verify that they are indeed correctly applied outside of the loops? Maybe something has changed?). I am aware that this is NEMO, but still if profiling has a huge overhead, it's worth checking that the transformations did not somehow moved profiling into a loop.

hiker avatar Jun 14 '24 10:06 hiker

With that you meant the automatically applied psydata-based profiling regions (using --profile kernels/invokes)? It is strange that this would cause a huge overhead (maybe verify that they are indeed correctly applied outside of the loops? Maybe something has changed?). I am aware that this is NEMO, but still if profiling has a huge overhead, it's worth checking that the transformations did not somehow moved profiling into a loop.

No, it's all OK :-) First off, this is the script itself adding profiling around anything that it hasn't managed to enclose in OpenACC. Second, the problem is simpler than you fear - it's simply (I think) that we ended up with less OpenACC coverage when I turned on inlining because sometimes, the routines we want to inline have already been processed by PSyclone and may therefore have had calls to the profiling API added. I think those calls are then preventing us from adding OpenACC around the region (into which the routine has been inlined). Now that I write that, I think I'll need to dig deeper as, if we'd already added profiling calls then that would have been because we'd previously found statements that we couldn't put in an OpenACC region anyway.

arporter avatar Jun 14 '24 10:06 arporter