aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

Improving AiiDA terminology

Open chrisjsewell opened this issue 2 years ago • 2 comments

I feel the documentation is lacking a top-level "primer" of what aiida-core is/does (plus having this terminology permeate in the API code).

Something along the lines of:

AiiDA is a workflow engine framework, which provides five core capabilities:

  1. Storage: AiiDA automates the storage of generated calculation/workflow inputs, outputs, and the provenance between them. AiiDA also provides functionality for introspecting (querying) this data.
  2. Communication: AiiDA provides built-in functionality to communicate with external compute services (such as HPCs and cloud); automating data transfer and job scheduling.
  3. Processing: AiiDA provides an API for building and running complex workflows, composed of one or more computations, that can be run locally or externally. AiiDA also provides process persistance (check-pointing), meaning that running workflows persist in the event of lost connections or system reboots.
  4. Developer interface: AiiDA provides a plugin system, for developers to extend aiida-core with their own workflows, data types, HPC interfaces, etc.
  5. User interface: AiiDA provides both command-line and web-based APIs for starting, monitoring and introspecting workflows.

Storage

I have recently been streamlining the storage API (e.g. #5154, #5172, #5320, #5145, #5228). It is of note that, although we currently split storage into the PostgreSQL database and the disk-objectstore repository, this should not be a primary concern for "standard" users. I would explain the storage as something like the following:

  • A Profile is intended for a single project, configured for the processing and storage of a single provenance graph.
  • Entities are subsections of a profile's storage: user, authinfo, computer, group, node, log, comment
  • Fields are (string) keys on an entity that point towards a (JSONable) value (such as its unique identifier)
    • By default, fields are deemed immutable once they are stored
  • The node entity is the primary component of a provenance graph (and the edges between them)
    • A node has a "special" field, attributes, which allows it to be extended to different data types
      • The attributes value is a dictionary that can contain keys specific to that data type
      • Process node attributes contain some special keys, which are deemed mutable until the process has completed (and is sealed). This includes the checkpoint key, which stores the process checkpoint in YAML serialized format.
    • A node also has a special field, extras, which allows users to store mutable data on any data type.
    • node entities can also store objects, which are (string) POSIX paths that point towards bytes data.
      • It is the data types responsibility to interpret object's bytes encoding format

Processing

The AiiDA daemon and RabbitMQ broker, both fall under this processing umbrella. I feel the terminology around this (daemons, workers, etc) is probably a bit confusing to "non-technical" users, and could be improved. Also, as per https://github.com/aiidateam/AEP/pull/30, it is quite possible that we will replace RabbitMQ in the future, so there should not be any terminology specific to that (e.g. broker) But more to come on this later...

chrisjsewell avatar Jan 24 '22 12:01 chrisjsewell

@chrisjsewell Please assignme this.I would like to work on this

soma2000-lang avatar Jan 29 '22 15:01 soma2000-lang

Hi @soma2000-lang, sorry for the late reply, what did you have in mind? I primarily wrote this down, whilst doing #5330, to collate some of my thoughts and circle back around to. But happy to collaborate

chrisjsewell avatar Feb 09 '22 18:02 chrisjsewell