aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

👌 IMPROVE: Allow SqliteTempBackend to read/write archives

Open chrisjsewell opened this issue 1 year ago • 9 comments

The key reason I want this, is to allow temporary profiles to be exported, which I think is pretty nice!

$ ipython                                                                                    
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:05:47) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from aiida import load_profile, engine, orm, plugins
   ...: from aiida.storage.sqlite_temp import SqliteTempBackend
   ...: 
   ...: %load_ext aiida
   ...: 
   ...: profile = load_profile(
   ...:     SqliteTempBackend.create_profile(
   ...:         'myprofile',
   ...:         options={
   ...:             'warnings.development_version': False,
   ...:             'runner.poll.interval': 1
   ...:         },
   ...:     ),
   ...: )

In [2]: orm.SinglefileData("/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/Dockerfile").store()
Out[2]: <SinglefileData: uuid: 1d2921fb-17bc-47e6-8db0-dbb98d5b6bd0 (pk: 1)>

In [3]: %verdi archive create --all --force test.aiida
Report: 
Archive Parameters
--------------------  ----------
Path                  test.aiida
Version               main_0001
Compression           6

Inclusion rules
----------------------------  -----
Computers/Nodes/Groups/Users  All
Computer Authinfos            False
Node Comments                 True
Node Logs                     True

Report: Validating Nodes                                                                                                                                                                                 
Report: Creating archive with:
-----  -
users  1
nodes  1
-----  -
Report: Finalizing archive creation...                                                                                                                                                                   
Report: Archive created successfully
Success: wrote the export archive file to test.aiida

chrisjsewell avatar Sep 23 '22 01:09 chrisjsewell

We would need a very good reason to slow this operation down.

@sphuber I've created a separate SandboxShaRepositoryBackend subclass, for use with SqliteTempBackend, so this will no longer affect general operation 👍

Can you explain why this is necessary? Why can't the archive not deal with UUID4 keys when exporting the repository content?

Because of here: https://github.com/aiidateam/aiida-core/blob/1c4df022752d3d58628d75643da5e43c35c1414c/aiida/tools/archive/create.py#L577-L581

as it mentions, it would be difficult to convert e.g. uuid4 -> sha256 during the export process, because you would have to also somehow change every node's repository_metadata

chrisjsewell avatar Sep 23 '22 09:09 chrisjsewell

as it mentions, it would be difficult to convert e.g. uuid4 -> sha256 during the export process, because you would have to also somehow change every node's repository_metadata

So there is no technical limitation, it simply isn't implemented. There is no reason that we cannot update the nodes attributes when exporting them, is there?

I think that it should actually be implemented because otherwise you are forcing all storage backends to use sha256 for the keys, or they cannot be exported. Especially with storage backends now being pluginnable, this seems like an unreasonable limitation. We might just want to document that when implementing one, choose sha256 whenever possible, so that exporting is cheap. But in some implementations, you might not have control over this.

sphuber avatar Sep 23 '22 09:09 sphuber

There is no reason that we cannot update the nodes attributes when exporting them, is there?

In theory yes, but it's certainly not trivial; because now you would have to add

  1. a step in the process (before node export) to generate a dict of all current -> sha256 file keys, which I guess involves eading every file in the repository, which could be super slow (and have this in memory, which could be large for large profiles)
  2. a step during the node export to (recursively) regenerate the repository_metadata and replace all keys, before exporting, which yeh would be pretty slow

Not something I have time for at the moment I'm afraid 😅

chrisjsewell avatar Sep 23 '22 09:09 chrisjsewell

Agree that it wouldn't be trivial, but certainly possible and at some point nice to have. I think we could simply first export all the database content of the nodes to the archive, and then iterate over the nodes to copy their repository content to the archive. This could be streamed so not everything needs to be read into memory and when going through the Repository interface, you get the new key hierarchy back which can then be inserted into the repository_metadata column of the respective node. This can be done in batches as to not have too many inserts, but also not keep to many repository_metadata hierarchies in memory. I will open an issue with this feature request and providing some context to this PR. If you could just add a small test that shows that you can export from a temp backend, I will accept this PR.

sphuber avatar Sep 23 '22 10:09 sphuber

@sphuber I also just added the requisite methods necessary to import an archive into a temporary profile, and added a test for the round-trip (profile1 -> archive -> profile2)

chrisjsewell avatar Sep 23 '22 10:09 chrisjsewell

@sphuber in 69d4ea9 I fixed a few places where I found that the backend was not being properly set.

This is why it run fine in isolation, but failed when running after any tests that had already loaded an aiida profile

chrisjsewell avatar Sep 23 '22 14:09 chrisjsewell

@sphuber tests are now parsing, so this is good to go

chrisjsewell avatar Sep 23 '22 15:09 chrisjsewell

Thanks for the review @sphuber, I've addressed your comments, and I think your proposed commits could only be implemented, once the Collection.__init__ code has been changed no?

chrisjsewell avatar Sep 26 '22 07:09 chrisjsewell

Thanks for the review @sphuber, I've addressed your comments, and I think your proposed commits could only be implemented, once the Collection.__init__ code has been changed no?

We can ignore the Collection.__init__ question for now yeah. I have resolved all conversations that are done, but there remain three open ones.

sphuber avatar Sep 26 '22 11:09 sphuber