PSyclone icon indicating copy to clipboard operation
PSyclone copied to clipboard

1883 use psyir for kern call arg list

Open hiker opened this issue 2 years ago • 2 comments

This adds support to create PSyIR expressions for all arguments in an LFRic kernel call. The old way (providing the arguments as strings) is still maintained.

This PR declares all required variables in the KernCallArgList object - I learned too late that only some of the arguments are declared in this object, the rest is actually in various collections in dynamo0p3 (though they are declared without proper type). For now this PR will actually replace symbols without a type with properly declared symbols.

In nearly all cases converting the PSyIR expressions to string will give identical results as the old string way - the only exceptions are Literals, which now properly indicate the precision (i.e. 2_i_def vs 2). I have also introduced a new test file kern_call_arg_list_test, with the goal that this test will eventually cover all of KernCallArtList.

One additional change I had to implement was to remove a space between certain arguments (e.g. map(cell, colour)). In order to have identical string and PSyIR output, the space had to be removed (since the FortranWriter does not add this space). I also had to modify some classes in dynamo0p3, since they declared symbols in the wrong symbol table.

Note that this PR does not fully address PSyIR creation, several #TODOs are in place - it just implements enough to hopefully be able to progress with driver creation for LFRic kernel extraction.

hiker avatar Oct 17 '22 04:10 hiker

Codecov Report

Base: 99.84% // Head: 99.84% // Increases project coverage by +0.00% :tada:

Coverage data is based on head (307e792) compared to base (98fc742). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #1918    +/-   ##
========================================
  Coverage   99.84%   99.84%            
========================================
  Files         299      301     +2     
  Lines       41346    41576   +230     
========================================
+ Hits        41280    41510   +230     
  Misses         66       66            
Impacted Files Coverage Δ
src/psyclone/domain/gocean/__init__.py 100.00% <100.00%> (ø)
src/psyclone/domain/gocean/go_symbol_table.py 100.00% <100.00%> (ø)
src/psyclone/domain/lfric/__init__.py 100.00% <100.00%> (ø)
src/psyclone/domain/lfric/algorithm/lfric_alg.py 100.00% <100.00%> (ø)
src/psyclone/domain/lfric/arg_ordering.py 100.00% <100.00%> (ø)
...rc/psyclone/domain/lfric/kern_call_acc_arg_list.py 100.00% <100.00%> (ø)
src/psyclone/domain/lfric/kern_call_arg_list.py 100.00% <100.00%> (ø)
...psyclone/domain/lfric/kern_call_invoke_arg_list.py 100.00% <100.00%> (ø)
src/psyclone/domain/lfric/kern_stub_arg_list.py 100.00% <100.00%> (ø)
src/psyclone/domain/lfric/lfric_symbol_table.py 100.00% <100.00%> (ø)
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Oct 17 '22 13:10 codecov[bot]

OK, it finally passed everything. Ready for a first look (very gentle please, I am shy :) )

hiker avatar Oct 18 '22 06:10 hiker

With compilation enabled, I get 13 test failures:

FAILED dynamo0p3_quadrature_test.py::test_face_qr[True] - AssertionError: ass...
FAILED dynamo0p3_quadrature_test.py::test_face_qr[False] - AssertionError: as...
FAILED dynamo0p3_test.py::test_dynruntimechecks_anydiscontinuous - AssertionE...
FAILED dynamo0p3_test.py::test_dynruntimechecks_multikern - AssertionError: a...
FAILED dynamo0p3_test.py::test_dyninvoke_runtime - AssertionError: assert '  ...
FAILED dynamo0p3_test.py::test_dynruntimechecks_anyw2 - AssertionError: asser...
FAILED dynamo0p3_test.py::test_dynruntimechecks_vector - AssertionError: asse...
FAILED domain/lfric/lfric_scalar_codegen_test.py::test_three_scalars - Assert...
FAILED domain/lfric/lfric_field_codegen_test.py::test_field - AssertionError:...
FAILED domain/lfric/lfric_field_codegen_test.py::test_field_fs - AssertionErr...
FAILED domain/lfric/lfric_field_codegen_test.py::test_int_field_fs - Assertio...
FAILED domain/lfric/transformations/dynamo0p3_transformations_test.py::test_haloex_rc4_colouring[False]
FAILED domain/lfric/transformations/dynamo0p3_transformations_test.py::test_haloex_rc4_colouring[True]

arporter avatar Oct 31 '22 15:10 arporter

With compilation enabled, I get 13 test failures:

FAILED dynamo0p3_quadrature_test.py::test_face_qr[True] - AssertionError: ass...
FAILED dynamo0p3_quadrature_test.py::test_face_qr[False] - AssertionError: as...
...
FAILED domain/lfric/transformations/dynamo0p3_transformations_test.py::test_haloex_rc4_colouring[False]
FAILED domain/lfric/transformations/dynamo0p3_transformations_test.py::test_haloex_rc4_colouring[True]

Gee, that was a pain. So it's not actual compilation failures, it's that the created code AFTER the compilation is different from what it was without compilation.

Basically, calling gen twice would create different code, the second time it would contain an additional use constant_mod, only: i_def (or so). Enabling compilation would FIRST do the compilation (which calls gen internally), then call gen for the test. So it's not really a compilation error.

Reason for this additional use statement is that the code for CodedKern is created after the declaration of all imported module. CodedKern calls self.arguments.raw_arg_list, which triggers the new KernCallArgList code, which added the precision symbol to the symbol table (besides everything else). Since the declaration have already been created, the output is not affected.

The second time gen is called, it will query the symbol table for all imported symbols, and now finds the imported constant_mod, and adds it to the declaration, which is why we get different results the second time gen was called.

Likely, the problem might go away once we switch to full PSyIR. This problem might also go away if we move the declaration of the symbols to the DynCollect classes (I didn't check, but just assuming that all the sub-classes will be created before, resulting in identical symbol tables the second time this is called).

The problem also goes away when we don't import precision symbols on a subroutine level (as discussed in our telco), which is what 87f668940 did, which then fixes this problem.

hiker avatar Nov 08 '22 00:11 hiker

OK, that should be ready for another review. The changes I have done are ... somewhat extensive, but I think they show that we are going in the right direction. All newly added declarations are moved into DynCollection based classes now (but I left the creation code in place in ArgOrdering, which is required for symbols that are not at all required. Note that we will get an exception if a symbol should be incorrectly be declared in DynCollection, and then re-declared in ArgOrderin).

Note that in the last commit I moved one function from domain/lfric/psyir into the (new) LFRicSymbolTable. This clears up some stuff quite nice ... but when doing this I realised that we are then very close to fix the usage of LFRicConst() while importing a module, so I thought I'd better stop here (and stashed my additional work for now). My suggestion , if you agree with my refactoring so far, to git revert this last commit, and make this the base for the next PR. I mainly left this in to show how nicely this gets rid of things like:

        # pylint: disable=undefined-variable
        sym = R_DEF

which we currently have in domain/lfric/psyir.py.

hiker avatar Nov 15 '22 06:11 hiker

I still get one compilation test failing:

This is annoying. To some degree it's the test case's fault which calls gen twice on the same tree. But since this I guess not uncommon, it obviously needs to be fixed. There are a few options to fix this, I am not sure about the best one, so let me explain the problem first:

In my PR I cache the PSyIR created for the start/stop expressions by putting the PSyIR nodes into the tree. Seems to be no good reason to recreate them all the time (and this is what the base class Loop does as well). But the failing test case looks like this:

do loop 0
...

do loop1

So the PSyIR nodes cached are based on these two loops when gen is called at this stage in the test. Then colouring is applied to the first loop:

do loop: 0
   do loop 1
...
do loop 2

But when start/stop_expr is called for the 2nd outer loop, which is now the third loop, it does not re-create the PSyIR expressions (using the new index number 2), instead it returns the psyir-nodes from the tree, which are based on index 1. So things go wrong from here.

Solutions:

  1. Disable the caching, i.e. do not put the nodes for the start/stop expressions into the tree. While this is easy and works (for now), I think we might get trouble later on when we use PSyIR, since at some stage the loop will be created on psyir and therefore the nodes. I am not sure if lowering_to_language will take care of this, hence I am discussing this here :)
  2. We give each Loop that is created a global number using a class variable. This way each loop will use the same variable names and tags all the time. We might have 'holes' (if we throw away a loop and create a new one instead), but there should be no real problem with that.
  3. We keep track of the index number used when putting the psyir expressions into the tree. When they are not the same, we recompute and replace the psyir nodes for start/stop

There might be other options. Suggestions welcome. As I've said, for now I've just removed the code that puts the start/stop psyir nodes into the tree, and then things work as expected. If you want me to open a ticket and add a TODO, let me know.

hiker avatar Nov 16 '22 04:11 hiker

But when start/stop_expr is called for the 2nd outer loop, which is now the third loop, it does not re-create the PSyIR expressions (using the new index number 2), instead it returns the psyir-nodes from the tree, which are based on index 1. So things go wrong from here.

I can't remember enough about this but presumably this means that somewhere else we are relying on counting loops to get the name of a variable when we should look it up instead? (Maybe with a tag?)

arporter avatar Nov 16 '22 10:11 arporter

But when start/stop_expr is called for the 2nd outer loop, which is now the third loop, it does not re-create the PSyIR expressions (using the new index number 2), instead it returns the psyir-nodes from the tree, which are based on index 1. So things go wrong from here.

I can't remember enough about this but presumably this means that somewhere else we are relying on counting loops to get the name of a variable when we should look it up instead? (Maybe with a tag?)

Yep, from stop expr - with posn being the index of the loop:

        root_name = f"loop{posn}_stop"
        ubound = sym_table.find_or_create_integer_symbol(root_name,
                                                         tag=root_name)

Note that while I have made the tag explicit, the original code did the same.

Having a system-wide unique loop number would neatly solve this problem, get rid of the counting to find the index (which we can replace due to DomainKernel 'loops'), and allow us to put the expressions into the PSyIR - though I have to admit putting the start/stop expressions into the tree only works if we replace loops, not when we for whatever reasons modify an existing loop (unless we then remove the previous start/stop children). Ticket time?

hiker avatar Nov 16 '22 12:11 hiker

In my PR I cache the PSyIR created for the start/stop expressions by putting the PSyIR nodes into the tree. Seems to be no good reason to recreate them all the time (and this is what the base class Loop does as well).

Aha! I agree that in future we should simply set the loop bounds (i.e. the children) and keep them, updating them when a loop is transformed. However, for historical reasons (that I can't remember) that's not how Dynamo does things at the moment and fixing it is out of scope here. I think @sergisiso has been working (#1731) towards resolving this. Therefore, I think your first 'solution' is the one to go for (i.e. remove the caching).

arporter avatar Nov 16 '22 14:11 arporter

Ready for next review.

hiker avatar Nov 17 '22 03:11 hiker

Ready for next review.

hiker avatar Nov 22 '22 23:11 hiker

Updated to latest master.

hiker avatar Nov 24 '22 11:11 hiker