Continuing from #206, I think we need to discuss a more general way of how to specify overrides for the codes.

This could be combined with the option to define code-agnostic overrides: e.g. if we want to change the smearing and k-points, should we do it via overrides (and define a common interface for such an override?).

E.g. the overrides could be a list of functions, and each implementation returns a generator of override functions, e.g.:

override_kpoints = get_override_kpoints(k_density=0.1, shift=(0,0,0))
override_smearings = get_override_smearings(type='gaussian', broadening=0.1)
apply_overrides(builder, [override_kpoints, override_smearings, ...])

(and we decide that the interface of get_override_kpoints and similar functions is the same for all plugins).

(And override function would be a function that gets a builder and returns a modified builder).

Just an initial idea to start discussion!

Oct 19 '21 12:10 giovannipizzi

To revitalize the discussion I would like to stress that my concerns expressed in #230 also applies here :-)

Oct 26 '21 06:10 d-wortmann

More general overrides

Here some discussions originating from a meeting held today with @sphuber @bosonie and me.

There are 2 goals we want to achieve:

generalize the concept of overrides, so one does not only provide an input namespace (called sub_process now) which implicitly implements a single possible rule ("replace the nodes generated by the protocols with these nodes") but more general rules. This is particularly important, e.g. for common workflows (like EOS) that need to have a way to change programmatically all the common workflows that are run.
implement as overrides a number of tasks that are common among codes (e.g.: change the k-points mesh, change the smearing type). The interpretation, here, is that the initial parameters of the calculation are decided by the input generator via protocols (that actually it's not just the protocol string, but also the spin_orbit flag, the electronic_type flag, as well as the strucure). Any other change should be just interpreted as "changes" to this workflow via overrides that are implemented "afterwards", after the protocol has made its decisions.

Possible implementation

Override functions

Here some practical comments on how these goals could be achieved (just from brainstorming, not necessarily the final solution we want to implement), but mostly to explain the concept.

For now, we define common workflows via python entrypoints, e.g.:

aiida.workflows:
  - "common_workflows.relax.quantum_espresso = aiida_common_workflows.workflows.relax.quantum_espresso.workchain:QuantumEspressoCommonRelaxWorkChain"
  - "common_workflows.relax.siesta = aiida_common_workflows.workflows.relax.siesta.workchain:SiestaCommonRelaxWorkChain"

etc.

In the same way, each implementation could provide a list of overrides, i.e. functions that perform a given action, e.g. as follows:

aiida.workflows.common_workflow_overrides:
  - "quantum_espresso.relax.common.kpoints = qe_kpt_override"
  - "quantum_espresso.relax.pools = ..."
  - "siesta.relax.common.kpoints = ..."
  - "siesta.relax.basis_set = ..."

Notes:

In general, overrides will have the following signature:
```
def qe_kpt_override(builder, **kwargs):
     ...
     return modified_builder
```
(TBD: should it change the builder in place? return a new builder?)
For "common" overrides (with .common. in the name, or some other way to signal them, see below), it must follow a specific API that we discuss and document case by case. That is, the kwargs are defined to be the same for all codes, and implemetations must respect it. As an example, for a function to change the k-points grid, we could agree to require the following API:
```
def <CODE_NAME>_kpt_override(builder, kpoints_distance, kpoints_offset):
   """
   kpoints_distance: distance in 1/ang
   kpoints_offset: list of three floats between 0 and 1
   """
   ...
```
the name of the entry-point group should probably be improved
the organization of the entry-point names needs to be rediscussed (e.g.: quantum_espresso.relax.pools vs relax.quantum_espresso.pools; and whether the .common. part should be part of the name, or we should instead define two entry point groups, i.e. have for instance two entry-point groups aiida.workflows.common_workflow_overrides and aiida.workflows.common_workflow_overrides_standardized).

Use of overrides in common workflows (like EOS)

The question now is how to pass the list of overrides, and their inputs. First of all, we want to specify an ordered list of overrides (and their inputs) to be applied (e.g. first change the pools, then change the k-points, then the smearing, ...).

There are various way to do it, depending on whether we are just OK with a single node with the full spec of what to do, or we want to expose some of this as AiiDA node with links. In the first case (that would actually make the overrides fully procedurally defined, that is a quite nice feature, but might be limited to JSON-serializable data and not to AiiDA nodes), a possibility is to use the following as the content of a Dict node, passed as input to the EOS, e.g. with link name override_specs:

overrides:
  - override: common.kpoints
    kwargs:
      kpoints_distance: 0.1
      kpoints_offset: [0, 0, 0]
  - override: pools
    kwargs:
      num_pools: 4

(Note that we can skip the first part <code>.relax of the entry-point name, because it can be inferred from the code being actually run, so you pass only common.kpoints, pools or similar). Also, we need to decide if this dictionary gets stored in the DB and if we need to query this info. In the latter case (TBD), we should think to a good data layout, that makes querybuilder-queries easy (to be thought is the format suggested above works or should be different).

If we want to pass nodes instead, one could e.g. have an input of the EOS workchain with label overrides_list getting a orm.List with an ordered list of entrypoints strings, determining which sequence of overrides to apply. And then (with a syntax/link name TBD) a set of Dict nodes, containing the kwargs for each of them, as different inputs (or even some more generalized approach if we need to pass AiiDA nodes).

Storing ovveride and generator inputs of the common WorkChain

Currently the generator_inputs namespace of EOS (see here) is manually replicated from the specs of the input generator, and is non_db. Similarly, also the sub_process input (current way of specifying "overrides") is non_db. Recently, thanks to #228, generator inputs are defined as specs - therefore @sphuber will investigate where to enable the possibility to just expose the inputs and store those in the DB (the missing thing to do in AiiDA is allowing to use Enums a valid types). This would be a great addition, enabling to store all inputs and allowing someone else to run the same workflow in a reproducible way, including all common and override inputs.

Note that this is better than the current status, even if it might be limiting in the future (so some discussion might be required, but we can decide to implement it anyway as it's an improvement over the current status).

Custom overrides

How to enable users to implement custom override functions?

This is unfortunately very tricky, because the function needs to be known to the AiiDA deamon. So, we can't allow to pass any generic function implemented in the submission script.

One way is that a user can define a small python package adding a new entrypoint, if really needed; this works, but is a bit cumbersome for the average user. However: already now, we emphasize that we only allow (with the sub_process namespace) to only act on inputs in a very specific way (namely, replacing nodes from the input generator with those in the sub_process). We can have the same feature defining ourselves some override functions that do these tasks, and provide them once for all codes. (actually it then becomes easy to extend the behavior in a few different and still general ways: e.g. instead of fully replacing nodes, update/merge a Dict node, or specify which keys to remove, etc.).

Example (if we focus on Dict nodes only, i.e. serializable JSON and no AiiDA nodes):

def merge_dict_override(builder, dict_input_link, merge_dict):   
    new_dict = copy.deepcopy(getattr(builder, dict_input_link).get_dict())
    new_dict.update(merge_dict)
    setattr(builder, dict_input_link, Dict(dict=new_dict))
    return builder

(Note: this is actually changing the builder in place, to be fixed to return a copy, probably).

TBD: what if we want to allow nodes? E.g. pass a new pseudopotential to replace one that was automatically picked? Is this a relevant usecase, or maybe in this case we pass the group name and let the override function select the right nodes and set them in the builder? (In any case, cutoffs might need to be changed in this case)

Defining a common workflow for relaxation as a wrapper for the common workflow interface

In the discussion, we realised that we could implement all the logic above not only for the common workflows like the EOS, but actually also for the relaxation that now only provides a common-workflow interface for each code: i.e., implement a common workflow for relaxation. The code should be the same and even easier than the EOS (just remove the scale-factor inputs, and use expose_inputs for the input generator specs now that it is possible, ...). This makes the logic on using the overrides implemented only once in the common relax workflow, and the common EOS becomes even simpler: it just forwards inputs and overrides to the common relax workflow.

Therefore, the plan is to see if we can implement this or if there are technical challenges associated to it.

Next meeting

We'll meet Fri 12th Nov afternoon - interested people can contact us to participate.

Nov 04 '21 19:11 giovannipizzi

(BTW this should be ok with @d-wortmann's concern - as this will prevent extending the number of common inputs; this remains to a minimum, and if a code wants to provide additional "flags" (change kpoint, enable DFT+U, ...), these can just be implemented as overrides.

Nov 04 '21 19:11 giovannipizzi

aiida-common-workflows
aiida-common-workflows copied to clipboard

Extended way of providing overrides

More general overrides

Possible implementation

Override functions

Use of overrides in common workflows (like EOS)

Storing ovveride and generator inputs of the common WorkChain

Custom overrides

Defining a common workflow for relaxation as a wrapper for the common workflow interface

Next meeting

aiida-common-workflows aiida-common-workflows copied to clipboard

Extended way of providing overrides

More general overrides

Possible implementation

Override functions

Use of overrides in common workflows (like EOS)

Storing ovveride and generator inputs of the common WorkChain

Custom overrides

Defining a common workflow for relaxation as a wrapper for the common workflow interface

Next meeting

aiida-common-workflows
aiida-common-workflows copied to clipboard