Usability: Improve `ProcessBuilder` API

Open GeigerJ2 opened this issue 3 months ago • 0 comments

When working with a ProcessBuilder, it was brought up, e.g., by @npaulish, that interaction with it is not straightforward, which makes finding where to edit what complicated for new users. What is currently possible is (see below):

printing, but shows only minimal information.
tab-completion two levels deep, e.g., b.pw.structure or b.pw.parameters is possible, but going further inside is not
b._port_namespace.get_description() (thanks, @mikibonacci) prints a full list and description (very overwhelming, and not public API) (the get_description is a method of plumpy's PortNamespace class, see here
There exists a _repr_pretty method on the ProcessBuilder class here, but the creation of the builder follows a dynamic class creation pattern, and the resulting object is not a plain ProcessBuilder (but abc.ProcessBuilder-<uuid>) so it does not implement _repr_pretty (not sure why not...)

The idea is to make the information in this dynamically generated, but explicit entity (i.e., the abc.ProcessBuilder-<uuid> contains the workflow spec) more accessible by adding public methods, and easier ways to explore the structure (and set values, possibly).

The main builder-related methods (and possible ways they could be replaced) are:

get_builder -> Direct instance creation and assignment
get_builder_restart -> ?
get_builder_from_protocol -> from_protocol alternative constructor of process class

In [9]: b = PwBaseWorkChain.get_builder()

In [32]: type(b)
Out[32]: abc.ProcessBuilder-bc7828aa-6610-4133-aa97-1e90129785e3

In [10]: b
Out[10]:
Process class: PwBaseWorkChain
Inputs:
metadata: {}
pw:
  metadata:
    options:
      stash: {}
  monitors: {}
  pseudos: {}

In [12]: b.  # tab completion
            clean_workdir()      kpoints_distance     metadata
            handler_overrides    kpoints_force_parity pw
            kpoints              max_iterations()

In [12]: b.pw.  # tab completion
               code            monitors        parent_folder   settings
               hubbard_file    parallelization pseudos         structure
               metadata        parameters      remote_folder   vdw_table

In [38]: pprint(b._port_namespace.get_description().keys())
dict_keys(['_attrs', 'metadata', 'max_iterations', 'clean_workdir', 'handler_overrides', 'pw', 'kpoints', 'kpoints_distance', 'kpoints_force_parity'])

In [39]: pprint(b._port_namespace.get_description()['pw'].keys())
dict_keys(['_attrs', 'metadata', 'code', 'monitors', 'remote_folder', 'structure', 'parameters', 'settings', 'parent_folder', 'vdw_table', 'pseudos', 'parallelization', 'hubbard_file'])

In [11]: pprint(b._port_namespace.get_description())
{'_attrs': {'default': (),
            'dynamic': False,
            'help': None,
            'required': 'True',
            'valid_type': "<class 'aiida.orm.nodes.data.data.Data'>"},
 'clean_workdir': {'default': '<function '
                              'BaseRestartWorkChain.define.<locals>.<lambda> '
                              'at 0x7f9e04986b00>',
                   'help': 'If `True`, work directories of all called '
                           'calculation jobs will be cleaned at the end of '
                           'execution.',
                   'is_metadata': 'False',
                   'name': 'clean_workdir',
                   'non_db': 'False',
                   'required': 'False',
                   'valid_type': "<class 'aiida.orm.nodes.data.bool.Bool'>"},
 'handler_overrides': {'help': 'Mapping where keys are process handler names '
                               'and the values are a dictionary, where each '
                               'dictionary can define the ``enabled`` and '
                               '``priority`` key, which can be used to toggle '
                               'the values set on the original process handler '
                               'declaration.',
                       'is_metadata': 'False',
                       'name': 'handler_overrides',
                       'non_db': 'False',
                       'required': 'False',
                       'valid_type': '(<class '
                                     "'aiida.orm.nodes.data.dict.Dict'>, "
                                     "<class 'NoneType'>)"},
 'kpoints': {'help': 'An explicit k-points list or mesh. Either this or '
                     '`kpoints_distance` has to be provided.',
             'is_metadata': 'False',
             'name': 'kpoints',
             'non_db': 'False',
             'required': 'False',
             'valid_type': '(<class '
                           "'aiida.orm.nodes.data.array.kpoints.KpointsData'>, "
                           "<class 'NoneType'>)"},
 'kpoints_distance': {'help': 'The minimum desired distance in 1/Å between '
                              'k-points in reciprocal space. The explicit '
                              'k-points will be generated automatically by a '
                              'calculation function based on the input '
                              'structure.',
                      'is_metadata': 'False',
                      'name': 'kpoints_distance',
                      'non_db': 'False',
                      'required': 'False',
                      'valid_type': '(<class '
                                    "'aiida.orm.nodes.data.float.Float'>, "
                                    "<class 'NoneType'>)"},
 'kpoints_force_parity': {'help': 'Optional input when constructing the '
                                  'k-points based on a desired '
                                  '`kpoints_distance`. Setting this to `True` '
                                  'will force the k-point mesh to have an even '
                                  'number of points along each lattice vector '
                                  'except for any non-periodic directions.',
                          'is_metadata': 'False',
                          'name': 'kpoints_force_parity',
                          'non_db': 'False',
                          'required': 'False',
                          'valid_type': '(<class '
                                        "'aiida.orm.nodes.data.bool.Bool'>, "
                                        "<class 'NoneType'>)"},
 'max_iterations': {'default': '<function '
                               'BaseRestartWorkChain.define.<locals>.<lambda> '
                               'at 0x7f9e049871c0>',
                    'help': 'Maximum number of iterations the work chain will '
                            'restart the process to finish successfully.',
                    'is_metadata': 'False',
                    'name': 'max_iterations',
                    'non_db': 'False',
                    'required': 'False',
                    'valid_type': "<class 'aiida.orm.nodes.data.int.Int'>"},
 'metadata': {'_attrs': {'default': (),
                         'dynamic': False,
                         'help': None,
                         'required': 'False',
                         'valid_type': 'None'},
              'call_link_label': {'default': 'CALL',
                                  'help': 'The label to use for the `CALL` '
                                          'link if the process is called by '
                                          'another process.',
                                  'is_metadata': 'True',
                                  'name': 'call_link_label',
                                  'non_db': 'False',
                                  'required': 'False',
                                  'valid_type': "<class 'str'>"},
              'description': {'help': 'Description to set on the process node.',
                              'is_metadata': 'True',
                              'name': 'description',
                              'non_db': 'False',
                              'required': 'False',
                              'valid_type': "(<class 'str'>, <class "
                                            "'NoneType'>)"},
              'disable_cache': {'help': 'Do not consider the cache for this '
                                        'process, ignoring all other caching '
                                        'configuration rules.',
                                'is_metadata': 'True',
                                'name': 'disable_cache',
                                'non_db': 'False',
                                'required': 'False',
                                'valid_type': "(<class 'bool'>, <class "
                                              "'NoneType'>)"},
              'label': {'help': 'Label to set on the process node.',
                        'is_metadata': 'True',
                        'name': 'label',
                        'non_db': 'False',
                        'required': 'False',
                        'valid_type': "(<class 'str'>, <class 'NoneType'>)"},
              'store_provenance': {'default': 'True',
                                   'help': 'If set to `False` provenance will '
                                           'not be stored in the database.',
                                   'is_metadata': 'True',
                                   'name': 'store_provenance',
                                   'non_db': 'False',
                                   'required': 'False',
                                   'valid_type': "<class 'bool'>"}},
 'pw': {'_attrs': {'default': (),
                   'dynamic': True,
                   'help': None,
                   'required': 'True',
                   'valid_type': "<class 'aiida.orm.nodes.data.data.Data'>"},
        'code': {'help': 'The `Code` to use for this job. This input is '
                         'required, unless the `remote_folder` input is '
                         'specified, which means an existing job is being '
                         'imported and no code will actually be run.',
                 'is_metadata': 'False',
                 'name': 'code',
                 'non_db': 'False',
                 'required': 'False',
                 'valid_type': '(<class '
                               "'aiida.orm.nodes.data.code.abstract.AbstractCode'>, "
                               "<class 'NoneType'>)"},
        'hubbard_file': {'help': 'SinglefileData node containing the output '
                                 'Hubbard parameters from a HpCalculation',
                         'is_metadata': 'False',
                         'name': 'hubbard_file',
                         'non_db': 'False',
                         'required': 'False',
                         'valid_type': '(<class '
                                       "'aiida.orm.nodes.data.singlefile.SinglefileData'>, "
                                       "<class 'NoneType'>)"},
      ...
        'vdw_table': {'help': 'Optional van der Waals table contained in a '
                              '`SinglefileData`.',
                      'is_metadata': 'False',
                      'name': 'vdw_table',
                      'non_db': 'False',
                      'required': 'False',
                      'valid_type': '(<class '
                                    "'aiida.orm.nodes.data.singlefile.SinglefileData'>, "
                                    "<class 'NoneType'>)"}}}

Played around a bit with this part of the code in a branch of my fork, and it should be doable: https://github.com/GeigerJ2/aiida-core/tree/process-builder-api-improvements

Sep 09 '25 10:09 GeigerJ2