OPTIMADE icon indicating copy to clipboard operation
OPTIMADE copied to clipboard

Workflow entry type

Open rartino opened this issue 7 years ago • 7 comments

It would be nice with a workflow entry type in OPTIMaDe do describe a workflow as steps.

It may be nice if calculations can reference this to explain the workflow taken in the calculation.

We could have abstract standardized names for steps, e.g., 'structure relaxation', plus allow database specific prefix ones, e.g. _exmpl_calculate_color.

It may be nice to think carefully what goes into workflow and what goes into calculation (parameters?)

rartino avatar Jun 14 '18 10:06 rartino

I believe, standardizing workflows goes beyond the scope of optimade at this stage. When asked, Donny actually was referring to providing, e.g. with a structure, a unique identifier of a workflow run to produce it. This seems like a much more manageable task.

ltalirz avatar Jun 14 '18 16:06 ltalirz

Agreed. But, in case we want to go in the direction of more explicitly encoded workflows eventually, perhaps the best name of that identifier is workflow_id? This connect closely to #24, I'll add that suggested field there.

rartino avatar Jun 14 '18 22:06 rartino

Some ideas might be borrowed from Common Workflow Language, which is stable (v1.0.2) already.

merkys avatar Jun 14 '18 22:06 merkys

A proposal is to, for now, handle this like we do with calculations: i.e., workflows is an "empty" entry type, that databases can populate with their own database-specific-prefix identifiers.

Both with this proposal and the existing calculations, these are now actually quite useful with the recent introduction of queries on relationships. Just from that mechanism there would directly be a standardized way to filter on, e.g., "what other calculations were produced using the same workflow as this one".

rartino avatar Jul 03 '19 13:07 rartino

As per discussion with @gmrigna during the workshop 2022, standardization and exchange of workflows between the different engines (AiiDA, FireWorks, etc.) seems more actual and demanded with time. The recent publication along with the aiida-common-workflows repository shows an ongoing work in progress.

blokhin avatar Jun 07 '22 18:06 blokhin

At the workshop 2023 a few of us, @gmrigna, @utf, @giovannipizzi, me and others were discussing workflow standardization. This lead up to some form of design idea. No claim here of consensus, just that the discussions spawned the ideas below - take this as fairly loose thoughts at this point.

Essentially, the idea is:

  • Mimic our property definitions but for abstract workflow declarations that just declare inputs (as OPTIMADE properties), outputs (also OPTIMADE properties) and a definition (think: a text description, but can be amended with other fields, e.g., links, etc.) and gives them a static, versioned URI.
  • Somewhere, in these declarations, or external to them, specify how some workflows can be implemented using others, e.g., how a "phonon" workflow can be broken down into static calculations of "energy + forces". (Sometimes this is automatically derivable because everything operates on the same space of inputs/outputs, but sometimes we need, e.g., flow control and branching?)
  • Codes and workflow engines can now publish a set of "I know how to do that"-declarations that specifies a URI for an abstract workflow declaration and describe, e.g. using CWL, exactly what to execute.
  • If a user now points to an abstract workflow and says "I want to do this", or "I want to do this with atomate and any code", or "I want to do this with httk and VASP", a 'resolver' can dig through all workflows + all "I know how to do this" declarations it is aware of and find one or more implementation strategies for the workflow.

This also fits with how one can start to build a provenance structure for OPTIMADE. A calculations entry can now be categorized based on the high level workflow that in the end was executed + the description of the workflows it ended up actually executing in the end + the first inputs + final outputs (described as OPTIMADE entries).

I then imagine a very similar design can be set up for experiments, where a workflow now is an experimental procedure with inputs and ouputs, and an /experiments endpoint is used to categorize experiments based on the abstract experimental procedure executed, how it was broken down into substeps, the original inputs and final outputs (described as OPTIMADE entries).

rartino avatar Jun 11 '23 20:06 rartino

First of all, thanks @rartino for summarizing and formalizing our discussions. I also think that it might be nice to involve @gpetretto and @davidwaroquiers in subsequent discussions.

gmrigna avatar Jun 12 '23 08:06 gmrigna