Currently all processes (calcfunctions) have to be toplevel (importable) in modules, otherwise uncachable

Open broeder-j opened this issue 5 years ago • 1 comments

Since is_valid_cache assumes that calcfunctions (all process classes/functions) are top-level (importable) in modules, calcfunctions which are not, become uncacheable. (If I understood this right)

I had classes of the following design to keep the provenance:

XY.py

def class XY:
    
    def function_x(self, *args, **kwargs):
        do something
        call function_for_provenance(AiiDA_datatypes)
    
    @calcfunction
    def function_for_provenance(AiiDA_datatypes):
        do stuff

Caching does not seem to work in these cases and will throw errors of the type:

File "/aiida-core/aiida/orm/nodes/process/process.py", line 485, in is_valid_cache
    process_class = self.process_class
  File "/aiida-core/aiida/orm/nodes/process/process.py", line 126, in process_class
    process_class = getattr(module, class_name)
AttributeError: module '....XY' has no attribute 'function_for_provenance'

A solution is of course to put calcfunctions out of classes and make them top level and importable, which should be always possible I guess. If this enforcement is wanted, this should be documented in the developer guide and caching sections.

I have not thought about if is_valid_cache implementation can be changed to allow for this, do we even want to? @greschd what do you think?

Mar 02 '20 20:03 broeder-j

I'm not entirely sure if this is just a documentation issue, or if there is a fixable bug (or several) hidden here somewhere. A few observations:

The immediate reason this does not work is because the .process_class property can not load the calcfunction. This is simply a consequence of the node / class split, and the fact that the calcfunction is not importable. However:
- it seems wrong for the .process_class property to raise AttributeError (due to this line, I assume) -- all other ways in which getting the process class can fail result in ValueError.
- even if the process_class access works, we don't actually do anything with it in the case of process functions - because they don't have the is_valid_cache hook, it is just skipped.
It would actually be possible to import and access the functions nested in a class: This is one of the reasons for the __qualname__ (since 3.3) attribute. It gives "the name relative to the top-level of the module", e.g. XY.function_for_provenance. The relevant code in creating the process type string is here: https://github.com/aiidateam/aiida-core/blob/develop/aiida/engine/processes/process.py#L486

Possible things to do:

I would propose to catch the AttributeError in process_class and re-raise as ValueError. That makes is_valid_cache just return False, instead of straight up excepting.
Also, I think replacing __name__ with __qualname__ would be good, since we no longer need to run on Python versions that don't support it.
I'm torn on whether we should just move the "hooking into the process_class" part of the is_valid_cache implementation to the CalcJobNode. Right now this is the only place we actually use it -- but in principle you could just define an is_valid_cache (function) attribute on a calcfunction and it would work as expected.

Tagging @sphuber for comment.

Mar 02 '20 21:03 greschd