PSyclone icon indicating copy to clipboard operation
PSyclone copied to clipboard

#2711 DoF Kernel Code Generation

Open oakleybrunt opened this issue 1 year ago • 22 comments

Closes #2711 which has a checklist of actions taken

oakleybrunt avatar Sep 16 '24 14:09 oakleybrunt

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.86%. Comparing base (aff7d5c) to head (ff578ee). Report is 63 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2712   +/-   ##
=======================================
  Coverage   99.86%   99.86%           
=======================================
  Files         354      354           
  Lines       48973    48990   +17     
=======================================
+ Hits        48909    48926   +17     
  Misses         64       64           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Sep 16 '24 14:09 codecov[bot]

@arporter Can you have a look at the failing build test please? It is repeatedly tripping over on "Run examples" complaining about ACCEnterData not being able to find any data to copyin. I haven't touched any examples so don't think this is related to my changes... but I'd like your assessment, please.

oakleybrunt avatar Sep 20 '24 15:09 oakleybrunt

@arporter Did I hear right this morning that it I should ignore stub generation for now?

oakleybrunt avatar Oct 07 '24 15:10 oakleybrunt

@arporter Did I hear right this morning that it I should ignore stub generation for now?

Yep, you should just make sure that you have a test that we refuse to generate a stub for such a kernel.

arporter avatar Oct 07 '24 15:10 arporter

@arporter, I can't get Codecov to recognize that I am checking the "uncovered" line (see in src/psyclone/tests/domain/lfric/dofkern_test.py::test_function_space_bare_undf). I'm stuck and asking for some help please.

oakleybrunt avatar Oct 09 '24 10:10 oakleybrunt

@arporter, I can't get Codecov to recognize that I am checking the "uncovered" line (see in src/psyclone/tests/domain/lfric/dofkern_test.py::test_function_space_bare_undf). I'm stuck and asking for some help please.

Unfortunately, you're a victim of the fact that DynFunctionSpaces is an old bit of the code and we don't have any unit tests for it directly. I think what you need to do is create a new test: use one of your new test examples and use get_invoke() to get hold of the invoke containing the dof kernel. You can then walk the Schedule to find the kernel. That will give you a valid kernel object which you can then give to the DynFunctionSpaces constructor.

arporter avatar Oct 09 '24 12:10 arporter

I forgot to say - the Dev Guide still needs updating (see previous comment).

arporter avatar Oct 11 '24 09:10 arporter

On master, there is no check that this isn't a Builtin - we always added a field argument. This may be because this method is never called for a Builtin (there's no actual Call generated in the PSy layer)? If that's the case, you shouldn't need the isinstance check. Please could you investigate? In fact, I'm wondering whether this method needs to be modified at all? What happens if you revert it to how it is on master?

This change adds the df reference to fields in dof kernels - e.g. CALL testkern_dofs(field_name(df), ...) which is how I thought fields were to be treated in dof kernel calls.

When I remove this it does not generate the df reference for dof kernels - e.g. CALL testkern_dofs(field_name, ...)

oakleybrunt avatar Oct 11 '24 09:10 oakleybrunt

On master, there is no check that this isn't a Builtin - we always added a field argument. This may be because this method is never called for a Builtin (there's no actual Call generated in the PSy layer)? If that's the case, you shouldn't need the isinstance check. Please could you investigate? In fact, I'm wondering whether this method needs to be modified at all? What happens if you revert it to how it is on master?

This change adds the df reference to fields in dof kernels - e.g. CALL testkern_dofs(field_name(df), ...) which is how I thought fields were to be treated in dof kernel calls.

When I remove this it does not generate the df reference for dof kernels - e.g. CALL testkern_dofs(field_name, ...)

Aha. Presumably then it isn't ever called for a BuiltIn?

arporter avatar Oct 11 '24 09:10 arporter

Also, what was the reason for having the 'bare' undf name - I'm sure there was one but I can't remember!

It was something that I had put in the user guide. In an earlier change on this branch I deemed it unnecessary since it would add a load of extra conditionals (see #2711 checklist) but you reviewed it and thought we should keep it (https://github.com/stfc/PSyclone/pull/2712#discussion_r1768245769)

oakleybrunt avatar Oct 11 '24 09:10 oakleybrunt

On master, there is no check that this isn't a Builtin - we always added a field argument. This may be because this method is never called for a Builtin (there's no actual Call generated in the PSy layer)? If that's the case, you shouldn't need the isinstance check. Please could you investigate? In fact, I'm wondering whether this method needs to be modified at all? What happens if you revert it to how it is on master?

This change adds the df reference to fields in dof kernels - e.g. CALL testkern_dofs(field_name(df), ...) which is how I thought fields were to be treated in dof kernel calls. When I remove this it does not generate the df reference for dof kernels - e.g. CALL testkern_dofs(field_name, ...)

Aha. Presumably then it isn't ever called for a BuiltIn?

I don't think it is, no. However, this is the change that fixed that problem with the examples failing (I don't know how, it just did).

oakleybrunt avatar Oct 11 '24 09:10 oakleybrunt

Also, what was the reason for having the 'bare' undf name - I'm sure there was one but I can't remember!

It was something that I had put in the user guide. In an earlier change on this branch I deemed it unnecessary since it would add a load of extra conditionals (see #2711 checklist) but you reviewed it and thought we should keep it (#2712 (comment))

Oh yes. I think I was wrong, sorry. It seems to be making life more complicated than it needs to be. If I were you, I'd start with a clean version of your branch and do some experiments with removing the 'bare' undf. You should quickly get a feel if it will work or if there's something we've forgotten about.

arporter avatar Oct 11 '24 09:10 arporter

I don't think it is, no. However, this is the change that fixed that problem with the examples failing (I don't know how, it just did).

I'm afraid that you have to understand the reason for the change. If this is never called by a BuiltIn then this check should not be necessary. I suggest you revert it and perhaps insert a judicious import pdb; pdb.set_trace() in the part of the code that fails. That should help you figure things out. We can do it together if you like.

arporter avatar Oct 11 '24 09:10 arporter

I don't think it is, no. However, this is the change that fixed that problem with the examples failing (I don't know how, it just did).

I'm afraid that you have to understand the reason for the change. If this is never called by a BuiltIn then this check should not be necessary. I suggest you revert it and perhaps insert a judicious import pdb; pdb.set_trace() in the part of the code that fails. That should help you figure things out. We can do it together if you like.

I'll give it a go but if I get stuck, that would be great, thank you! :)

oakleybrunt avatar Oct 11 '24 09:10 oakleybrunt

I don't think it is, no. However, this is the change that fixed that problem with the examples failing (I don't know how, it just did).

I'm afraid that you have to understand the reason for the change. If this is never called by a BuiltIn then this check should not be necessary. I suggest you revert it and perhaps insert a judicious import pdb; pdb.set_trace() in the part of the code that fails. That should help you figure things out. We can do it together if you like.

Apologies, Andy. I was wrong, the isinstance did not fix the Run Examples check, it was that I hadn't been appending the field (needed self.append(name, var_accesses, var_access_name=sym.name)

oakleybrunt avatar Oct 11 '24 10:10 oakleybrunt

Ah, I removed the isinstance check from KernCallArgList and it caused the src/psyclone/tests/domain/lfric/transformations/dynamo0p3_transformations_test.py::test_accenterdata_builtin test to fail. The test file it is calling has a builtin and a cell_column kernel in the invoke. Somehow the builtin is going through KernCallArgList. I'll investigate

oakleybrunt avatar Oct 11 '24 11:10 oakleybrunt

Is there a possibility that when ACC transformations happen, the KernCallArgList is remade with no checks for where the arguments originate from? I'm not entirely sure what's happening

oakleybrunt avatar Oct 11 '24 12:10 oakleybrunt

Is there a possibility that when ACC transformations happen, the KernCallArgList is remade with no checks for where the arguments originate from? I'm not entirely sure what's happening

I think the sub-class KernCallACCArgList is used to work out what data needs to be transferred to the GPU. Perhaps this method gets called by that class?

arporter avatar Oct 11 '24 14:10 arporter

I think the sub-class KernCallACCArgList is used to work out what data needs to be transferred to the GPU. Perhaps this method gets called by that class?

That is exactly what's happening. Does this mean that the isinstance check is justified, or that another change needs to be made?

oakleybrunt avatar Oct 16 '24 12:10 oakleybrunt

That is exactly what's happening. Does this mean that the isinstance check is justified, or that another change needs to be made?

It probably means that you need to override this method in KernCallACCArgList and make sure that it provides the correct list of arguments for an ACC enter data directive. For a field, this will just be its name, irrespective of whether or not the kernel operates on cell columns or dofs (which is different from the kernel argument that this method in this class is generating).

arporter avatar Oct 16 '24 12:10 arporter

Unfortunately the compilation checks failed: image Looks like you just need proper declarations for the arguments to the test kernel.

You can test this locally by doing pytest --compile ....

arporter avatar Oct 18 '24 16:10 arporter

As I am working on making the dof kernel (src/psyclone/tests/test_files/dynamo0p3/testkern_dofs_mod.f90) compile, I noticed again that the arguments passed to the subroutine are single dofs, real or integer. This makes me question why we are passing undf to the kernel call at all. Originally, I thought it would be used for creating an array (e.g dimension(undf)), but I don't see a use for it here, much like it isn't passed to LFRic built-ins.

oakleybrunt avatar Oct 21 '24 12:10 oakleybrunt

Pretty much done now. The only thing left is to add a bit more context in a couple of the tests so that we can be sure that the halo-exchanges/set-dirty/set-clean are being called in the right place. Once that's done, I suggest we merge this PR and then someone (you?) will need to alter LFRic so that it actually uses some of these new dof kernels and we can check that it works OK. I'll fire off the integration tests now.

Yes, I know that the PSyKAl-lite code in LFRic is being rewritten as true kernels and algs. This should be a good opportunity to try out a dof kernel :)

oakleybrunt avatar Oct 29 '24 13:10 oakleybrunt

Integration tests were fine (modulo the existing race-condition problem in threaded NEMOv4).

arporter avatar Oct 30 '24 09:10 arporter