aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

Try and make `Computer` a mutable object by removing attributes that influence provenance

Open sphuber opened this issue 3 years ago • 4 comments

Currently the Computer class is fully immutable when stored. This can cause usability problems when people configure it incorrectly, most notably the mpirun_command, and once they notice it after having run a first calculation on it, they can no longer change it and are forced to delete calculations, code and then the computer, or create a new one. It could be argued that the computer should have minimal impact on the provenance as far as the attributes go that are defined for it in AiiDA. If we could move this attribute (and potentially others) to for example the Code instead, the Computer instance could be made mutable and mistakes can be easily corrected.

Of course, we introduce the possibility that a user can configure a Computer for machine A, run a bunch of calculations, and then update the hostname to match machine B and run more calculations there. The provenance will contain a bunch of calculations that seem to have been run with the same Computer but were actually run on a different machine. Note that the Code would have to work on both machine A and B. This is an edge case, and I think this would be an acceptable risk and we should just instruct users that this is clearly not what one should do. Discussed this with @louisponet and he agreed with this.

sphuber avatar Jun 10 '21 11:06 sphuber

I would second some flexibility here as well.

In terms of usability, there is a precursor to this, which is to allow verdi computer delete - with a user prompt - to simply delete all nodes associated with this computer as well (just as verdi node delete deletes nodes connected via the provenance). To me, this is a no-brainer - I've opened a separate issue for this.

ltalirz avatar Feb 25 '22 11:02 ltalirz

Here's another very simple idea for approaching the mutability of Computers and Codes that would solve the main use cases I have in mind: copy-on-edit.

Let's start with Codes: say, you realize you need to update something in the configuration of mycode@mycomputer (but of course you don't want to change all your scripts). We provide an interface to "edit"/"update" the code (details to be discussed [1]). When you save your changes, two things happen:

  • the old Code is automatically relabeled to something like mycode-legacy-01 [2]
  • a new Code is created with label mycode@mycomputer

In the case of Computers, the same applies, only that after creating the new Computer we will also want to create clones of the Codes that were installed on the old Computer.

If I'm thinking this through correctly, this solves the user's problem (no need to update scripts, no need to re-setup things from scratch), while also fully preserving provenance (previous Code/Computer objects still exist in the database).

Thoughts @sphuber ?

[1] There could e.g. be a verdi code update interface that opens a YAML file; in Python, we could simply do this if a users makes changes to a loaded Code object and then store()s it.

[2] If we feel it is necessary, instead of modifying the label, this could instead add an attribute or extra that points to the "successor Code", thus marking the Code as "legacy" so it can be automatically excluded by default from interfaces like verdi code list, etc.

P.S. Of course, this would not apply to scripts that load codes/computers by PK/UUID rather than by label. I have personally not come across people using this in their daily workflow, though.

ltalirz avatar Mar 23 '23 14:03 ltalirz

Here's another very simple idea for approaching the mutability of Computers and Codes that would solve the main use cases I have in mind: copy-on-edit

you beat me to it @ltalirz lol 👍; I also had this thought on https://github.com/aiidateam/team-compass/issues/12, but hadn't gotten round to writing it down

The only issue to resolve I think would be for querying, you may perhaps want to query for all calculations using a certain computer plus all of its old version. Also maybe you want to have functionality to get the newest computer/code, given any of its "ancestors", and for the get_builder_restart method, you would want to fetch the newest computer/code, rather than the one actually used

chrisjsewell avatar Mar 30 '23 06:03 chrisjsewell

The only issue to resolve I think would be for querying, you may perhaps want to query for all calculations using a certain computer plus all of its old version.

Also maybe you want to have functionality to get the newest computer/code, given any of its "ancestors", and for the get_builder_restart method, you would want to fetch the newest computer/code, rather than the one actually used

As long as the "copy" contains a pointer to the original (e.g. an attribute copied_from: <uuid> [1]), all of these use cases seem straightforward to implement (can be done at a later time, as needed).

In order not to confuse users, my feeling is we probably want to both have the pointer attribute and relabel the original code.

[1] In principle this would be a relational table between codes/computers; don't know whether this is overkill here.

ltalirz avatar Apr 02 '23 19:04 ltalirz