PSyclone (Closes #2661) Add support for kernels that operate on halos (for FFSL in LFRic)

Oct 14 '24 10:10 arporter

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.87%. Comparing base (d865146) to head (8272570). Report is 81 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #2745    +/-   ##
========================================
  Coverage   99.87%   99.87%            
========================================
  Files         355      356     +1     
  Lines       49324    49449   +125     
========================================
+ Hits        49260    49385   +125     
  Misses         64       64

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Oct 14 '24 14:10 codecov[bot]

The updates to the halo-exchange logic are complicated because, previously, redundant computation could only be performed to either a literal depth or to the maximum halo depth. If a kernel has a stencil access then that does not affect the loop bounds but does affect the depth of halo that needs to be clean on entry to the kernel.

Oct 17 '24 08:10 arporter

I've removing all of the 'literal_depth' related stuff - we have a single depth and it's always PSyIR. Unfortunately the multi-grid tests are failing (and probably lots of others).

Oct 17 '24 15:10 arporter

@tommbendall this branch is ready for you to try. @sergisiso I think this is probably one for you as it touches a lot of things :-( Essentially it adds support for a new type of LFRic kernel that operates on cells in the halos. As that obviously affects what depth of halo needs to be clean/marked dirty before/after such a kernel, I had to do quite a lot of work on the halo-exchange logic. This was complicated by the fact that that code was pre-PSyIR.

Nov 01 '24 14:11 arporter

I've made branches under lfric-core ticket #4452 and lfric-apps ticket #418 to pick up this change. I'll arrange for them to be tested.

However I do have one awkward thought (sorry!): does the halo_depth argument actually need passing into the kernel? I think we should be able to just pass it as an argument at the algorithm layer and kernels shouldn't need modification (other than to use the new operates_on metadata?

Finally, if I understand right, is the halo_depth argument the final argument at the algorithm layer, but the second argument (after nlayers) at the kernel layer?

Nov 06 '24 11:11 tommbendall

Also just to mention, I've tested this branch with the respective LFRic Core and Apps branches -- it compiled successfully and the psy-layer looked correct to me. I'm happy to do this again to pick up any changes

Nov 07 '24 15:11 tommbendall

Finally, if I understand right, is the halo_depth argument the final argument at the algorithm layer, but the second argument (after nlayers) at the kernel layer?

Yes that's right, but I will shortly remove the associated kernel argument :-)

Nov 08 '24 08:11 arporter

CI-happiness permitting, this is ready for review again.

Nov 11 '24 10:11 arporter

This is ready for another look now @sergisiso and @tommbendall. I'll fire off the integration tests again now too.

Nov 13 '24 14:11 arporter

Thanks so much for all of this, it's been such a mammoth task!

I've updated my LFRic Core and Apps branches and tested this branch against them. Everything compiled correctly but frustratingly there is an issue that I hadn't anticipated with removing two of the extended mesh PSyKAl-lite kernels.

Both of those kernels loop over halo cells, and also use stencils. Given the kernel metadata, PSyclone produces the correct psy-layer, specifying that fields to be read by the stencil need to have a halo exchange to a depth of halo_depth + stencil_depth, where halo_depth is the depth of cells to loop the kernel over.

However these two kernels seem to only use parts of the stencil that are parallel to the halos, and not perpendicular -- thus in the PSyKAl-lite implementations, the halo exchange is only to a depth of halo_depth. We therefore trigger errors when trying to perform a halo exchange up to a depth of halo_depth + stencil_depth as the . I couldn't find a record of this being one of the reasons for these kernels requiring a PSyKAl-lite implementation, so I had missed this -- sorry!

The good news is that once I returned those two kernels to use PSyKAl-lite implementations, all of the transport and gungho-model tests ran successfully. This PR therefore still allows us to remove the most important PSyKAl-lite implementation. As those two extended mesh kernels relate to an older science configuration, I suggest we open a new issue and for now use PSyKAl-lite code for them, and can decide later how important it is to address those.

From my point-of-view this is therefore ready to merge!

Nov 14 '24 11:11 tommbendall