PSyclone LFRic intergrid kernels: problem with continuous fields

Alex Brown has noticed that there is a problem with some LFRic intergrid kernels that map continuous (W2) fields between fine and coarse meshes.

This seemed to happen when the model's halo depth is smaller than the number of fine cells per coarse cell. As the field is continuous, the model attempts to do a halo exchanges but we think they are not being updated to the correct depth.

For now our work-around (on a branch) is to change the meta-data to say that the mapped field is discontinuous. This prevents any halo exchanges calls from being inserted by PSyclone (as these kernels we don't actually need to do any halo exchanges!)

The offending kernels are: https://code.metoffice.gov.uk/trac/lfric/browser/LFRic/trunk/gungho/source/kernel/prolong_w2_kernel_mod.F90 https://code.metoffice.gov.uk/trac/lfric/browser/LFRic/trunk/gungho/source/kernel/restrict_w2_kernel_mod.F90 https://code.metoffice.gov.uk/trac/lfric/browser/LFRic/trunk/gungho/source/kernel/weights_prolong_w2_kernel_mod.F90

Dec 08 '21 16:12 tommbendall

Two proposed solutions were:

Psyclone could allow GH_WRITE for continuous fields, writing to the shared DoFs twice. Essentially treating the continuous fields as discontinuous fields
Psyclone could prevent intergrid kernels from looping into the halo region

Dec 08 '21 17:12 atb1995

The problem is that the current PSyclone support for intergrid kernels was motivated by the multi-grid solver. As such it assumes that each grid in a hierarchy increases (or decreases) in resolution by a factor of 2 in both x and y directions. This therefore applies to the halo depths too. It seems that the requirements of the multi-grid solver do not quite line up with what is being done with using the intergrid support in order to run physics at different resolutions.

This needs to be explored further.

Jan 10 '22 08:01 arporter

Actually I'm not sure that there is necessarily a conflict between the uses for the intergrid kernels! The multi-grid solver in gungho currently only restricts and prolongs a W3 (fully-discontinuous) field, and for our physics-dynamics coupling we still use the same kernels for W3 fields. The two differences are that we have extended the infrastructure to also map W2 fields, and have relaxed the assumption of the 2x2 nesting. So I think that if the intergrid mappings for physics-dynamics coupling are supported then they will also be supported for the multi-grid solver.

It occurs to me that in future there may be a desire from LFRic developers to implement other multi-grid formulations, which might involve the same extensions that we're using for the physics-dynamics coupling (although I don't know too much about multi-grid techniques myself so I'm not sure!)

I have also begun to wonder whether our current workaround is fairly robust? PSyclone is generating the correct psy layer code, it's just that we're specifying something slightly unintuitive in the kernel metadata.

Jan 10 '22 10:01 tommbendall

I'd argue that the workaround is just that - it means the metadata is no longer correct and, once we get around to implementing checks that supplied kernel arguments are on the correct spaces, those checks would fail. I'm sure there's a more general solution: I think the key issue is to understand why you can get away without halo exchanges in your use case (and thus whether that's applicable to the multi-grid use case as well) and how we should handle the fact that the nesting is now configurable. @christophermaynard has agreed to setup a meeting so we can get together to discuss this and hopefully work out what's going on :-)

Jan 10 '22 13:01 arporter

I realise that in our meeting today we didn't discuss the problem with configurable nesting of meshes and the fact that the halos then become too deep. That feels like a general problem that needs fixing - avoiding doing the halo swaps will only delay the problem. To recap; if a halo exchange is required for a field on the 'fine' mesh (for an intergrid kernel) then PSyclone assumes that its depth must be double that on the 'coarse' mesh. This assumption needs revisiting.

However, more directly related to this issue, we agreed that it is possible for a kernel that updates a field on a continuous function space to safely write to it in parallel. Essentially, this means that the same dof will be written to more than once but, since every thread writes the same value, this doesn't matter. Such a kernel cannot however read from that shared dof before it is written as it may or may not have already been updated by a neighbouring thread. Reads after a write are fine as, even if another thread writes to the same dof, it will only be writing the value that the local thread has already put there.

Jan 28 '22 15:01 arporter

Created branch 1542_cont_write to alter the rules for fields with GH_WRITE access.

Feb 02 '22 13:02 arporter

Please note that we will need GH_READWRITE, too. See e.g. LFRic tickets 3020 and 3032.

Update GH_READWRITE is not safe with continuous FS, so we decided to allow only GH_WRITE.

Feb 04 '22 13:02 TeranIvy

Hi all! I am reviewing the PR #1608 so I wanted to check what would be the correct behaviour when changing access for continuous writers from GH_INC to GH_WRITE.

Looping limits change from 1, mesh%get_last_halo_cell(1) to 1, mesh%get_last_edge_cell()
No halo exchanges are generated for readers in the kernel.
No colouring is required in parallel as we are not writing into halos. Update: As discussed with PSyclons, it is because the same values are written to shared dofs by different columns.

See e.g. Ian Boutle's changes in LFRic: https://code.metoffice.gov.uk/trac/lfric/changeset/33872/LFRic/trunk

For safety, 1) - 3) can only hold when all continuous writers in a kernel have GH_WRITE access. As soon as we have at least one continuous writer with the GH_INC access in a kernel, we would need to revert to the usual behaviour for safety: looping to 1, mesh%get_last_halo_cell(1), halo exchanges for readers and colouring.

@tommbendall and @atb1995, I would appreciate your thoughts 😄

Update: Ian had a look and agrees that this is fine with his requirements when using GH_WRITE for continuous FS in Physics kernels.

Feb 14 '22 17:02 TeranIvy

I think the reason we do not need colouring is not to do with accessing the halos, it is to do with the fact that the writes to shared dofs from different columns have the same value. Therefore we do not have a race condition - we just don't know which column updates the shared dof value first when using OpenMP. This will cause some cache thrashing but is not a correctness issue.

Feb 14 '22 17:02 rupertford

Having studied the Teams conversation, I've updated the Developer Guide as follows. Does it now accurately describe what needs to be implemented?

Feb 15 '22 08:02 arporter

i.e. we don't need to do redundant computation in order to get the correct values for annexed dofs at the edge of a partition so the loop limit becomes last_edge_cell()?

Feb 15 '22 08:02 arporter

Looping limits change from 1, mesh%get_last_halo_cell(1) to 1, mesh%get_last_edge_cell()

No halo exchanges are generated for readers in the kernel.

No colouring is required in parallel as we are not writing into halos. Update: As discussed with PSyclons, it is because the same values are written to shared dofs by different columns.

See e.g. Ian Boutle's changes in LFRic: https://code.metoffice.gov.uk/trac/lfric/changeset/33872/LFRic/trunk

For safety, 1) - 3) can only hold when all continuous writers in a kernel have GH_WRITE access. As soon as we have at least one continuous writer with the GH_INC access in a kernel, we would need to revert to the usual behaviour for safety: looping to 1, mesh%get_last_halo_cell(1), halo exchanges for readers and colouring.

Hi all, this looks completely correct to me!

PS: @arporter mentioned above about the other half of the problem, which is how to handle halo exchanges for intergrid kernels and the respective halo depths for the fine and coarse cells. We haven't forgotten about this -- I have been having some thoughts about intergrid kernels that may use stencils but didn't want to say anything until I'd understood what this would need! I suggest once we've decided what to do there, I open a separate issue to discuss that.

Feb 15 '22 09:02 tommbendall

PR #1608 that allows GH_WRITE access for fields on continuous function spaces is now on master. However, this was only part of the requirements for this issue so I will leave it open for now.

May 25 '22 11:05 TeranIvy

@tommbendall and @TeranIvy, I'm just reviewing LFRic-related tickets and found this one. As far as I can tell, it's still open because we haven't addressed the issue of halo depths in fine and coarse meshes for intergrid kernels. Has there been any progress in determining what the requirements are?

Feb 01 '23 10:02 arporter

Ah sorry. I have some intergrid kernels that use stencils working on a branch and there wasn't any problem with the different halo depths. With respect to the original issue, since this was solved by #1608, if no one objects then we could close this issue?

Feb 02 '23 08:02 tommbendall

Always happy to close an issue @tommbendall :-) However, I'm just a bit worried about the current hard-coded assumption (in PSyclone) that any nested grids are a factor 2x2 different in resolution as you said earlier you were thinking of relaxing (or even had already relaxed) this constraint?

Feb 02 '23 09:02 arporter

Yes that's true, we have relaxed that constraint. I have to confess to not being able to remember where that assumption is made! It may be that we haven't noticed them as a problem because our intergrid kernels using stencils still use psykal-lite code.

Feb 02 '23 09:02 tommbendall

It's made in the bit of PSyclone that determines what depth of halo is required when dealing with inter-grid kernels. Does your psykal-lite support varying nesting factors? If so, presumably we can extend PSyclone to do what it does (by querying the appropriate meshes??).

Feb 02 '23 12:02 arporter

Yes, it does support varying nesting factors.

I've reminded myself of what exactly it is that we're doing. With this kernel, we are writing to fine cells that are nested within a non-halo coarse cell, but using a stencil to read information from the neighbouring coarse cells (which may be in the halos). So we only need to do the halo exchange on the coarse field that we are reading from. As with other stencil code, I specify the stencil_depth as an argument and perform the halo exchange up to that depth.

Feb 03 '23 10:02 tommbendall