aiida-core
aiida-core copied to clipboard
👌 IMPROVE: Allow SqliteTempBackend to read/write archives
The key reason I want this, is to allow temporary profiles to be exported, which I think is pretty nice!
$ ipython
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:05:47)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.31.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from aiida import load_profile, engine, orm, plugins
...: from aiida.storage.sqlite_temp import SqliteTempBackend
...:
...: %load_ext aiida
...:
...: profile = load_profile(
...: SqliteTempBackend.create_profile(
...: 'myprofile',
...: options={
...: 'warnings.development_version': False,
...: 'runner.poll.interval': 1
...: },
...: ),
...: )
In [2]: orm.SinglefileData("/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/Dockerfile").store()
Out[2]: <SinglefileData: uuid: 1d2921fb-17bc-47e6-8db0-dbb98d5b6bd0 (pk: 1)>
In [3]: %verdi archive create --all --force test.aiida
Report:
Archive Parameters
-------------------- ----------
Path test.aiida
Version main_0001
Compression 6
Inclusion rules
---------------------------- -----
Computers/Nodes/Groups/Users All
Computer Authinfos False
Node Comments True
Node Logs True
Report: Validating Nodes
Report: Creating archive with:
----- -
users 1
nodes 1
----- -
Report: Finalizing archive creation...
Report: Archive created successfully
Success: wrote the export archive file to test.aiida
We would need a very good reason to slow this operation down.
@sphuber I've created a separate SandboxShaRepositoryBackend
subclass, for use with SqliteTempBackend
, so this will no longer affect general operation 👍
Can you explain why this is necessary? Why can't the archive not deal with UUID4 keys when exporting the repository content?
Because of here: https://github.com/aiidateam/aiida-core/blob/1c4df022752d3d58628d75643da5e43c35c1414c/aiida/tools/archive/create.py#L577-L581
as it mentions, it would be difficult to convert e.g. uuid4 -> sha256
during the export process, because you would have to also somehow change every node's repository_metadata
as it mentions, it would be difficult to convert e.g.
uuid4 -> sha256
during the export process, because you would have to also somehow change every node'srepository_metadata
So there is no technical limitation, it simply isn't implemented. There is no reason that we cannot update the nodes attributes when exporting them, is there?
I think that it should actually be implemented because otherwise you are forcing all storage backends to use sha256 for the keys, or they cannot be exported. Especially with storage backends now being pluginnable, this seems like an unreasonable limitation. We might just want to document that when implementing one, choose sha256 whenever possible, so that exporting is cheap. But in some implementations, you might not have control over this.
There is no reason that we cannot update the nodes attributes when exporting them, is there?
In theory yes, but it's certainly not trivial; because now you would have to add
- a step in the process (before node export) to generate a dict of all
current -> sha256
file keys, which I guess involves eading every file in the repository, which could be super slow (and have this in memory, which could be large for large profiles) - a step during the node export to (recursively) regenerate the
repository_metadata
and replace all keys, before exporting, which yeh would be pretty slow
Not something I have time for at the moment I'm afraid 😅
Agree that it wouldn't be trivial, but certainly possible and at some point nice to have. I think we could simply first export all the database content of the nodes to the archive, and then iterate over the nodes to copy their repository content to the archive. This could be streamed so not everything needs to be read into memory and when going through the Repository
interface, you get the new key hierarchy back which can then be inserted into the repository_metadata
column of the respective node. This can be done in batches as to not have too many inserts, but also not keep to many repository_metadata
hierarchies in memory. I will open an issue with this feature request and providing some context to this PR. If you could just add a small test that shows that you can export from a temp backend, I will accept this PR.
@sphuber I also just added the requisite methods necessary to import an archive into a temporary profile, and added a test for the round-trip (profile1 -> archive -> profile2)
@sphuber in 69d4ea9 I fixed a few places where I found that the backend was not being properly set.
This is why it run fine in isolation, but failed when running after any tests that had already loaded an aiida profile
@sphuber tests are now parsing, so this is good to go
Thanks for the review @sphuber, I've addressed your comments, and I think your proposed commits could only be implemented, once the Collection.__init__
code has been changed no?
Thanks for the review @sphuber, I've addressed your comments, and I think your proposed commits could only be implemented, once the
Collection.__init__
code has been changed no?
We can ignore the Collection.__init__
question for now yeah. I have resolved all conversations that are done, but there remain three open ones.