aiida-common-workflows
aiida-common-workflows copied to clipboard
Extended way of providing overrides
Continuing from #206, I think we need to discuss a more general way of how to specify overrides for the codes.
This could be combined with the option to define code-agnostic overrides: e.g. if we want to change the smearing and k-points, should we do it via overrides (and define a common interface for such an override?).
E.g. the overrides could be a list of functions, and each implementation returns a generator of override functions, e.g.:
override_kpoints = get_override_kpoints(k_density=0.1, shift=(0,0,0))
override_smearings = get_override_smearings(type='gaussian', broadening=0.1)
apply_overrides(builder, [override_kpoints, override_smearings, ...])
(and we decide that the interface of get_override_kpoints
and similar functions is the same for all plugins).
(And override function would be a function that gets a builder and returns a modified builder).
Just an initial idea to start discussion!
To revitalize the discussion I would like to stress that my concerns expressed in #230 also applies here :-)
More general overrides
Here some discussions originating from a meeting held today with @sphuber @bosonie and me.
There are 2 goals we want to achieve:
-
generalize the concept of overrides, so one does not only provide an input namespace (called
sub_process
now) which implicitly implements a single possible rule ("replace the nodes generated by the protocols with these nodes") but more general rules. This is particularly important, e.g. for common workflows (like EOS) that need to have a way to change programmatically all the common workflows that are run. -
implement as overrides a number of tasks that are common among codes (e.g.: change the k-points mesh, change the smearing type). The interpretation, here, is that the initial parameters of the calculation are decided by the input generator via protocols (that actually it's not just the protocol string, but also the
spin_orbit
flag, theelectronic_type
flag, as well as the strucure). Any other change should be just interpreted as "changes" to this workflow via overrides that are implemented "afterwards", after the protocol has made its decisions.
Possible implementation
Override functions
Here some practical comments on how these goals could be achieved (just from brainstorming, not necessarily the final solution we want to implement), but mostly to explain the concept.
For now, we define common workflows via python entrypoints, e.g.:
aiida.workflows:
- "common_workflows.relax.quantum_espresso = aiida_common_workflows.workflows.relax.quantum_espresso.workchain:QuantumEspressoCommonRelaxWorkChain"
- "common_workflows.relax.siesta = aiida_common_workflows.workflows.relax.siesta.workchain:SiestaCommonRelaxWorkChain"
etc.
In the same way, each implementation could provide a list of overrides, i.e. functions that perform a given action, e.g. as follows:
aiida.workflows.common_workflow_overrides:
- "quantum_espresso.relax.common.kpoints = qe_kpt_override"
- "quantum_espresso.relax.pools = ..."
- "siesta.relax.common.kpoints = ..."
- "siesta.relax.basis_set = ..."
Notes:
- In general, overrides will have the following signature:
(TBD: should it change the builder in place? return a new builder?)def qe_kpt_override(builder, **kwargs): ... return modified_builder
- For "common" overrides (with
.common.
in the name, or some other way to signal them, see below), it must follow a specific API that we discuss and document case by case. That is, thekwargs
are defined to be the same for all codes, and implemetations must respect it. As an example, for a function to change the k-points grid, we could agree to require the following API:def <CODE_NAME>_kpt_override(builder, kpoints_distance, kpoints_offset): """ kpoints_distance: distance in 1/ang kpoints_offset: list of three floats between 0 and 1 """ ...
- the name of the entry-point group should probably be improved
- the organization of the entry-point names needs to be rediscussed (e.g.:
quantum_espresso.relax.pools
vsrelax.quantum_espresso.pools
; and whether the.common.
part should be part of the name, or we should instead define two entry point groups, i.e. have for instance two entry-point groupsaiida.workflows.common_workflow_overrides
andaiida.workflows.common_workflow_overrides_standardized
).
Use of overrides in common workflows (like EOS)
The question now is how to pass the list of overrides, and their inputs. First of all, we want to specify an ordered list of overrides (and their inputs) to be applied (e.g. first change the pools, then change the k-points, then the smearing, ...).
There are various way to do it, depending on whether we are just OK with a single node with the full spec of what to do, or we want to expose some of this as AiiDA node with links.
In the first case (that would actually make the overrides fully procedurally defined, that is a quite nice feature, but might be limited to JSON-serializable data and not to AiiDA nodes), a possibility is to use the following as the content of a Dict
node, passed as input to the EOS, e.g. with link name override_specs
:
overrides:
- override: common.kpoints
kwargs:
kpoints_distance: 0.1
kpoints_offset: [0, 0, 0]
- override: pools
kwargs:
num_pools: 4
(Note that we can skip the first part <code>.relax
of the entry-point name, because it can be inferred from the code being actually run, so you pass only common.kpoints
, pools
or similar).
Also, we need to decide if this dictionary gets stored in the DB and if we need to query this info. In the latter case (TBD), we should think to a good data layout, that makes querybuilder-queries easy (to be thought is the format suggested above works or should be different).
If we want to pass nodes instead, one could e.g. have an input of the EOS workchain with label overrides_list
getting a orm.List
with an ordered list of entrypoints strings, determining which sequence of overrides to apply. And then (with a syntax/link name TBD) a set of Dict nodes, containing the kwargs for each of them, as different inputs (or even some more generalized approach if we need to pass AiiDA nodes).
Storing ovveride and generator inputs of the common WorkChain
Currently the generator_inputs
namespace of EOS (see here) is manually replicated from the specs of the input generator, and is non_db
. Similarly, also the sub_process
input (current way of specifying "overrides") is non_db
.
Recently, thanks to #228, generator inputs are defined as specs - therefore @sphuber will investigate where to enable the possibility to just expose the inputs and store those in the DB (the missing thing to do in AiiDA is allowing to use Enum
s a valid types). This would be a great addition, enabling to store all inputs and allowing someone else to run the same workflow in a reproducible way, including all common and override inputs.
Note that this is better than the current status, even if it might be limiting in the future (so some discussion might be required, but we can decide to implement it anyway as it's an improvement over the current status).
Custom overrides
How to enable users to implement custom override functions?
This is unfortunately very tricky, because the function needs to be known to the AiiDA deamon. So, we can't allow to pass any generic function implemented in the submission script.
One way is that a user can define a small python package adding a new entrypoint, if really needed; this works, but is a bit cumbersome for the average user.
However: already now, we emphasize that we only allow (with the sub_process
namespace) to only act on inputs in a very specific way (namely, replacing nodes from the input generator with those in the sub_process
).
We can have the same feature defining ourselves some override functions that do these tasks, and provide them once for all codes. (actually it then becomes easy to extend the behavior in a few different and still general ways: e.g. instead of fully replacing nodes, update/merge a Dict node, or specify which keys to remove, etc.).
Example (if we focus on Dict
nodes only, i.e. serializable JSON and no AiiDA nodes):
def merge_dict_override(builder, dict_input_link, merge_dict):
new_dict = copy.deepcopy(getattr(builder, dict_input_link).get_dict())
new_dict.update(merge_dict)
setattr(builder, dict_input_link, Dict(dict=new_dict))
return builder
(Note: this is actually changing the builder
in place, to be fixed to return a copy, probably).
TBD: what if we want to allow nodes? E.g. pass a new pseudopotential to replace one that was automatically picked? Is this a relevant usecase, or maybe in this case we pass the group name and let the override function select the right nodes and set them in the builder? (In any case, cutoffs might need to be changed in this case)
Defining a common workflow for relaxation as a wrapper for the common workflow interface
In the discussion, we realised that we could implement all the logic above not only for the common workflows like the EOS, but actually also for the relaxation that now only provides a common-workflow interface for each code: i.e., implement a common workflow for relaxation. The code should be the same and even easier than the EOS (just remove the scale-factor inputs, and use expose_inputs for the input generator specs now that it is possible, ...). This makes the logic on using the overrides implemented only once in the common relax workflow, and the common EOS becomes even simpler: it just forwards inputs and overrides to the common relax workflow.
Therefore, the plan is to see if we can implement this or if there are technical challenges associated to it.
Next meeting
We'll meet Fri 12th Nov afternoon - interested people can contact us to participate.
(BTW this should be ok with @d-wortmann's concern - as this will prevent extending the number of common inputs; this remains to a minimum, and if a code wants to provide additional "flags" (change kpoint, enable DFT+U, ...), these can just be implemented as overrides.