aiida-core
aiida-core copied to clipboard
CLI: Add the `verdi init` command
This is an alternative solution to #6305
The verdi init
command is slightly different in that it doesn't just create a new profile, but it creates the profile in a new AiiDA instance, i.e., a new .aiida
folder is created (in the current working directory by default). This makes the behavior similar to git init
that initializes a new repo. Once the user is done, the instance folder can simply be removed and there will be no trace left. By default the command will create a profile with the core.sqlite_dos
which won't require any services, but alternatively a profile with core.sqlite_zip
can be created instead from an existing archive. This version will be read-only, but will allow a user to easily start browsing an archive with a single command verdi init --from-archive some_archive.aiida
.
Note, this currently would rely on #6303 and some other internal changes that are part of #6305 but the general concept is here. We should discuss exactly what we want the interface and behavior of verdi init
to be.
There are still some questions about desired behavior of the command:
- What happens when
verdi init
is run in a directory that already contains an.aiida
folder. Currently it just aborts with an error. - If we decide that in the above case, instead the existing folder is used, what happens when the
init
profile already exists? Just error and stop? Allow to create another profile automatically choosing another profile name? Even prompt the user for that name?
There are still some questions about desired behavior of the command
I think the current behaviour is fine, after all the user can simple change a directory, right? Are there any use cases where this would potentially be undesirable?
I fear anything we'd do otherwise would be confusing / unintuitive. E.g. if the init profile is created next to an existing profile, should it still be made default?
I think the current behaviour is fine, after all the user can simple change a directory, right? Are there any use cases where this would potentially be undesirable?
I agree. However, @giovannipizzi suggested that the current syntax for creating a new profile in an existing instance, e.g. verdi profile setup core.sqlite_dos
, is still too complicated. Users might be tempted to try and run verdi init
to create another profile in the same instance. I think this becomes too complicated though and there is little benefit. If a user does in fact want to start using multiple profiles in the same instance, they can learn how to use verdi profile setup
. Still I wanted to open this up for discussion in case others have different intuitions on what is intuitive UX.
I have addressed the comments. Does anyone still have major comments? Otherwise I propose we merge this so we can start dogfooding this. We will wait with the release for a while so we can still further fine tune it. I will also start working on improving the docs to include this new method of getting started with AiiDA in an easy way. But will leave that for a separate PR.
Hi @sphuber, I am not clear how to use this feature. Suppose I create a folder and run verdi init
. Then, how do I use this newly created aiida folder?
- how does the verdi command know that I want to work on this folder? What if I change to another aiida folder?
- how do I load the profile in python script?
- what's the relation between a local aiida folder and the global aiida configuration?
Maybe this will be clear in the docs PR?
how does the verdi command know that I want to work on this folder? What if I change to another aiida folder?
With #6322 AiiDA will now check the current working directory for an .aiida
folder. It now works essentially as git
. So you can switch an AiiDA instance by changing the working directory. When you run verdi init
it will create a new .aiida
folder in the current working directory (just like git init
) and so it will immediately use that. Give it a try:
verdi init
verdi status
verdi profile list
verdi profile show
You should see that it now uses the local .aiida
folder and it created a new profile called init
.
how do I load the profile in python script?
Exactly as normal with load_profile
or use verdi run
or the runaiida
executable. Nothing changes. If you open a shell, it will pick up the new instance and profile
(aiida-py311) sph@invader:~/code/aiida/env/dev/aiida-core$ verdi init
Success: Initialized new AiiDA instance in `/home/sph/code/aiida/env/dev/aiida-core/.aiida`.
Report: Initialising the storage backend.
Report: Storage initialisation completed.
Success: Created new profile `init`.
Success: Configured the localhost as a computer.
(aiida-py311) sph@invader:~/code/aiida/env/dev/aiida-core$ verdi status
✔ version: AiiDA v2.5.1.post0
✔ config: /home/sph/code/aiida/env/dev/aiida-core/.aiida
✔ profile: init
✔ storage: SqliteDosStorage[/home/sph/.aiida/repository/sqlite_dos_2057f645d8a544109563f725831dc589]: open,
⏺ broker: No broker defined for this profile: certain functionality not available.
⏺ daemon: No broker defined for this profile: daemon is not available.
(aiida-py311) sph@invader:~/code/aiida/env/dev/aiida-core$ verdi shell
Python 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from aiida import load_profile
In [2]: load_profile()
Out[2]: Profile<uuid='5afa7003e9af44ca955df40c5dc56c66' name='init'>
what's the relation between a local aiida folder and the global aiida configuration?
They define separate AiiDA instances that are completely isolated. See the docs: https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/installation.html#isolating-multiple-instances
I got a problem
(aiida) xing@thinkpad:~/tests/aiida/test-6315$ verdi init
verdi status
verdi profile list
verdi profile show
Success: Initialized new AiiDA instance in `/home/xing/tests/aiida/test-6315/.aiida`.
Report: Initialising the storage backend.
Report: Storage initialisation completed.
Success: Created new profile `init`.
Success: Configured the localhost as a computer.
✔ version: AiiDA v2.5.1.post0
✔ config: /home/xing/.aiida/envs/aiida/.aiida
✔ profile: default
the config still in ✔ config: /home/xing/.aiida/envs/aiida/.aiida
. This probably because I using a conda environment, and I set $AiiDA_PATH
$ echo $AIIDA_PATH
/home/xing/.aiida/envs/aiida
thus, the AiiDA_PATH
from #6322 is overrided.
thus, the
AiiDA_PATH
from #6322 is overrided.
You are right. The verdi init
command takes an argument [DIRECTORY]
(which defaults to current working directory) where it will create the .aiida
folder. This currently does not respect the AIIDA_PATH
.
The question now is what the behavior should be. This command is intended for beginning users of AiiDA, so they are unlikely to have defined AIIDA_PATH
. Users who specify AIIDA_PATH
will most likely also already have .aiida
folders and won't really need verdi init
. I guess we can simply have verdi init
respect the AIIDA_PATH
variable. If the user specifies an explicit directory in the command, e.g., verdi init some/path
and AIIDA_PATH
is defined, I guess we should error in that case?
Actually, should we even allow to specify the path? git
allows this, which is why I added it, but it doesn't seem to make much sense. If you do
verdi init some/path
verdi status
The second command will not actually work against the just created instance. I think I will remove it and just get the path from AIIDA_PATH
defaulting to current working directory if not defined.
I guess we can simply have verdi init respect the AIIDA_PATH variable.
We need to think about how to handle the global verdi and local verdi. As a user, even though I set the AIIDA_PATH
, I still want to use verdi init
to create a local AiiDA folder. In this case, the local folder should be higher priority than the global AIIDA_PATH
.
btw, this is not a problem for git
, since it does not have a global config.
We need to think about how to handle the global verdi and local verdi. As a user, even though I set the
AIIDA_PATH
, I still want to useverdi init
to create a local AiiDA folder. In this case, the local folder should be higher priority than the globalAIIDA_PATH
.
Fair point. That means the logic of #6322 should be inverted. That the current working directory takes precendence over AIIDA_PATH
. Does that have unexpected consequences though? If a user relies on AIIDA_PATH
but then happens to change directories to a path that contains an .aiida
folder in the hierarchy, then all of a sudden, the AIIDA_PATH
is ignored and the instance is switched.
If a user relies on AIIDA_PATH but then happens to change directories to a path that contains an .aiida folder in the hierarchy, then all of a sudden, the AIIDA_PATH is ignored and the instance is switched.
yes, but I think it's not [Edit] a problem, but an expected behavior. The user want to change the AiiDA config when going to a local AiiDA folder and its child folders.
I am also leaning towards local .aiida having preference, seems more intuitive.
Another perhaps more general question, how does this interact with verdi daemon? Can there be more isolated daemons running? Or is there a one global instance that can handle different .aiida folders?
I am also leaning towards local .aiida having preference, seems more intuitive.
It is fine by me to change to this behavior, I am just a bit worried about breaking behavior. People that have been relying on AIIDA_PATH
could all of a sudden accidentally target the wrong instance if they change to the "wrong" directory. If more people this think risk is acceptable, I am happy to change the precedence rules.
Another perhaps more general question, how does this interact with verdi daemon? Can there be more isolated daemons running? Or is there a one global instance that can handle different .aiida folders?
Currently, you can have multiple instances running alongside one another just fine. Each daemon is perfectly isolated. The AIIDA_PROFILE
variable is inherited by the process daemon as you start it. We should just check that the same goes for the current working directory, such as to ensure the started daemon process also has this working directory. Otherwise, we would have to manually forward it by setting the AIIDA_PATH
to the CWD. Should be easy to test.
It is not so trivial to just reverse the precedence of CWD and AIIDA_PATH to determining the config dir. As soon as a config dir is created in the users home folder, that will almost always take precedence over AIIDA_PATH
since one is almost always in a directory whose parent is the user's home folder. Since currently the config directory is created by default in ~/.aiida
I am afraid that AIIDA_PATH
will essentially become useless. Not quite sure how to fix this.
Removing the automatic creation of ~/.aiida
would require quite some changes. Also not sure what to replace it with. All verdi
commands and Python API rely on the directory automatically being created. verdi init
is really the first exception.
Leaves perhaps to ignore ~/.aiida
when checking the CWD. So check CWD with parent folders, but excluding the home directory, then check AIIDA_PATH
and only when still not found default to ~/.aiida
. This makes the logic even more complicated.
Hypothetically, even if we disregard backwards compatibility just for the time being, what would the ideal most intuitive behavior be? Seems that having the "global" config in the home directory may be complicating things, but if we get rid of it, it will force users to always manually run something like verdi init
to initialize a config somewhere. Also explaining to users that not doing so in the home folder may be beneficial if they ever plan on running multiple instances, is going to be way too complicated.
So, thoughts?
My though:
- If both the
home/.aiida
directory andAIIDA_PATH
exit, it raises an error for the user. -
AIIDA_PATH
is most probably used by advanced users. In the docs, we can suggest users remove thehome/.aiida
if they really want to define their ownAIIDA_PATH
, e.g., in the conda env.
Hmm, tricky. I don't think erroring out is very user-friendly, rather we should have a predictable behaviour.
Leaves perhaps to ignore ~/.aiida when checking the CWD. So check CWD with parent folders, but excluding the home directory, then check AIIDA_PATH and only when still not found default to ~/.aiida. This makes the logic even more complicated.
Tbh although this feels hacky, it seems like a best option for now and ia well defined.
I assume that most (beginner?) users don't use AIIDA_PATH, so this will not affect the behaviour for most, and at the same time we will not break advanced users using AIIDA_PATH. If they are already customizing their aiida path, it would seem they will have less incentive to use the new init command anyway.
Thinking in abstract, Essentially we have similar situation as in python, i.e. using venv versus global install. So we could potentially have some kind activation script in the future. But that seems like an extra complication for now. Let's dogfood it for a while and see?
Hi all, very interesting feature and discussion! For the use case of quickly exploring an archive with a temporary profile without polluting the main one, this works perfectly, and I love the idea!
To me:
Leaves perhaps to ignore ~/.aiida when checking the CWD. So check CWD with parent folders, but excluding the home directory, then check AIIDA_PATH and only when still not found default to ~/.aiida. This makes the logic even more complicated.
also seems like the best solution to this conundrum so far. I learned about AIIDA_PATH
rather late into my AiiDA journey, so I also wouldn't consider it a beginner feature, though, that's of course only anecdotal.
Some other points:
- If we have the
--from-archive
option, should we also allow passing a URL, similar toverdi archive import
? - One could also provide the option to directly import it, e.g. something like:
verdi init --from-archive <local archive/URL> --import
(assuming #6316 is fixed)?
But:
verdi init
verdi archive import <local archive/URL>
should be simple enough, really...
Though, there are a few more thoughts that I'd like to bring up here, which I also mentioned briefly in the last aiidateam
meeting:
As verdi init
fundamentally creates a profile with one of the SQLite backends and without RabbitMQ, one cannot create a full production environment with the command. I know that here, this is basically the point, but if we introduce it, people might start using it over the other options like verdi quicksetup
and verdi profile setup
for convenience, especially if they are familiar with git init
. However, they might then hit a roadblock if they want to actually continue with this profile (similar to what @superstar54 asked about being able to switch the database later on).
In the end, in this PR, I think we are trying to achieve two separate goals:
- Allowing for quick and easy setup of a profile without the services (using SQLite and no RabbitMQ)
- Having a local, self-contained AiiDA instance similar to when running
git init
(which leaves no trace when being deleted), that consists of two parts:- Using the local
.aiida
config, and - Using a local database (again, SQLite)
- Using the local
(And maybe 3: Having a profile for just exploring an archive without importing - though, that seems to tie in with point 2, just using zip
rather than disk-objectstore
)
One could add more customization options to verdi init
, however, this seems like we'd just gradually convert it back into the other setup commands (as would be the case if we would add more options to verdi quicksetup
, e.g. to allow other database backends). Also, many of the command line arguments are the same between the different commands.
So I'm wondering if, rather than now adding verdi init
on top, we could just provide one (or maybe two) verdi
commands for setting things up (quicksetup
or init
- I'd vouch for the latter), but then provide different turn-key/convenience setup options via double-dash command line flags. In this scenario, the initialization command by default automatically tries to create the "best" setup available on the system, with as little required user input as possible (at least that would be my personal preference), while users should still be able to overwrite individual options. For each of the turn-key options, we can provide a short and concise overview in the docs with the intended use case and strengths and weaknesses (similar to the one for the storage backends).
For example:
-
verdi init
by default could automatically try to use PostgreSQL and RabbitMQ, resorting to SQLite and/or no broker if these fail, and write to the global config. -
verdi init --production
could explicitly only work if both services are set up properly. -
verdi init --local
(orverdi init --contained
) could use SQLite and RabbitMQ or no broker, again depending on availability, with a local config (similar to whatverdi init
of this PR is doing). -
verdi init --no-services
could explicitly be used to set up a profile with SQLite and without RabbitMQ (writing to the global or local config being arbitrary in that case). -
verdi init --full
could ask for all configuration options (likeverdi setup
right now does), not just the essential ones.
The --non-interactive
flag would still function as it does now in all cases. Writing to the local and global config could also generally be added as command line options via --global
/--local
again mirroring how git
behaves.
Effectively, it all boils down to where we put the logic/complexity for the different possible ways to set up an AiiDA instance: As separate verdi
commands as it is done right now, or as one verdi
command, but then with the different command line options, as I'm proposing here. As removing stuff will be backwards incompatible, any clean-up would go into v3 release, I suppose, but then I would only introduce verdi init
now, if it is here to stay.
I think all of this will be good to discuss as part of the upcoming aiida-core
coding days.
Thanks a lot @sphuber , verdi init
is great.
two minor things:
I would update this message: --appears if verdi status
without a profile--
verdi status
Report: Configure a profile by running `verdi quicksetup` or `verdi setup`.
I would have expected that after verdi delete
, the local .aiida
directory get deleted -the one in local path not the ~/.aiida
-. Is there a reason for the current design?
I would update this message: --appears if
verdi status
without a profile--
Totally agree. Was just awaiting the discussion on what we will do with the verdi quicksetup/setup
commands. But most likely it will have to be updated.
I would have expected that after verdi delete , the local .aiida directory get deleted -the one in local path not the ~/.aiida-. Is there a reason for the current design?
There is no verdi delete
command. Do you mean verdi profile delete
? The whole point of verdi init
is that you don't need any verdi commands to clean up, you just delete the .aiida
folder with rm
and that's it. Or do I misunderstand your point?
Before starting with proper dogfooding, I already have some conceptual comments I wanted to make, mostly on https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe.
Let's start with some semantics: What exactly is an AiiDA instance? Interestingly, the definition of this concept was sublty changed in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe. Before, the documentation said:
An AiiDA instance is defined as the installed source code plus the configuration folder that stores the configuration files with all the configured profiles.
This is changed in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe:
An AiiDA instance is defined by its configuration directory, which is always named
.aiida
.
So, the source code is no longer part of the definition of an AiiDA "instance". Indeed, the changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe allow for multiple AiiDA configuration folders ("instances" in the new definition) for one Python environment and AIIDA installation.
I understand the problem the commit is trying to solve. Once a user starts having multiple AiiDA installations, they needed to use AIIDA_PATH
to properly separate them, else they'd run into all sorts of trouble. This is exactly why I started working on aiida-project
(Still very much a work in progress; I wish I could dedicate more time to it). It respects the previous definition of an AiiDA instance: the lines to set the AIIDA_PATH
environment variable are automatically added to the activate script of the virtual environment. Having to write our own environment management tool did seem a bit much, admittedly[^1]. I wondered if somehow the configuration couldn't be automatically tied to the environment, e.g. by having a separate subfolder in the .aiida
folder based on the path to the Python binary or something. I'm curious how other Python packages that have configurations deal with this.
However, I'm not sure if the new definition of an AiiDA instance makes sense to me. Does it make sense to have multiple AiiDA instances for one Python environment/AiiDA installation? What advantages do multiple separate instances have over multiple profiles in the same instance? If not, perhaps we should look for an alternative approach to properly separate the "old" version of AiiDA instances (code and configuration) instead. I'm already worried about a few things:
- Having another layer of "instances" might confuse things. One can already have multiple AiiDA installs in separate environments, and separate profiles for each install. Adding a layer in between for multiple AiiDA configurations only makes sense if it adds something.
- There have already been quite some discussions on which configuration should take precedence. I would argue that
AIIDA_PATH
should remain king (git
also givesGIT_DIR
precedence, by the way). If the issue is complex to resolve for us, it's bound to lead to confusion for users. One AiiDA install has one configuration is simple to understand.
So, I'm mainly wondering if we should not revert https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe in favor of a different approach to separating AiiDA instances that respects the original definition: AiiDA source code/install and configuration.
[^1]: Although it could have additional use cases. One thing that has been on my TODO list for a while is providing conda
support for aiida-project
, and then automatically installing the services and activating them inside the environment when using the cda
command, which aiida-project
uses to switch AiiDA "projects" (instances in the "old" definition). It could then probably also automatically configure the RabbitMQ timeout etc. We could also use it as a sort of aiidalab-launch
for AiiDA, i.e. also support "Docker" projects.
@mbercx some really insightful comments! The fact the this PR changes the semantics (not only behaviour) is a really great observation.
But I would like to turn this around back at you. :-) The fact that you needed to start aiida-project
is a good indicator that the status quo is not fully satisfactory. For such a critical thing as project management, I'd much rather for it to be solved in aiida-core rather than an external package.
I wondered if somehow the configuration couldn't be automatically tied to the environment, e.g. by having a separate subfolder in the .aiida folder based on the path to the Python binary or something
What advantages do multiple separate instances have over multiple profiles in the same instance?
I think this is a key question to answer. I wonder if/how people use different profiles? I don't have too much experience with them, but to me the concept is very opaque. At least from the command line, working with different profiles is kind of awkward, since you need to always provide an extra argument for non-default profiles, right?
I would argue that AIIDA_PATH should remain king (git also gives GIT_DIR precedence, by the way).
I guess I agree, with the caveat that with should discourage users to muck around with AIIDA_PATH and prefer .aiida path-based git-like discovery if we go forward with this.
To me the git-like model makes things quite a bit easier to wrap my head around. Different projects are simply different directories. If you for some reason need different aiida installation for different projects, you could put Python venv (.venv
) alongside .aiida
folder.
Besides the .git
analogy, I think it might be useful to think about this in terms of managing Python environments. Right now, by default aiida behaves like pip, having an ~/.aiida folder that's global to the user. What this PR proposes is more akin to the venv model.
But I would like to turn this around back at you. :-) The fact that you needed to start aiida-project is a good indicator that the status quo is not fully satisfactory. For such a critical thing as project management, I'd much rather for it to be solved in aiida-core rather than an external package.
aiida-project
also does a bunch of other stuff in a way that I like (happy to discuss some time ^^), but I agree it would be better to deal with the proper separation of instances in aiida-core
. I hope I didn't give the impression that I was arguing to change nothing. 😱
I wonder if/how people use different profiles? I don't have too much experience with them, but to me the concept is very opaque.
To me, different profiles allow me to have multiple storages in (what I call) the same project. A typical use case would be a dev
profile where I mess around with things, and a nice and clean prod
profile for the real production runs.
At least from the command line, working with different profiles is kind of awkward, since you need to always provide an extra argument for non-default profiles, right?
You can use verdi profile setdefault
to switch profiles. Typically I don't have to switch that often that it's tedious.
To me the git-like model makes things quite a bit easier to wrap my head around. Different projects are simply different directories. If you for some reason need different aiida installation for different projects, you could put Python venv (.venv) alongside .aiida folder.
It's interesting to see the different perspectives here. ^^ Due to AiiDA's heavy reliance on plugins, I can't imagine having the same Python environment for two different projects. If for some reason you need to have multiple datasets in the same AiiDA installation, you can make a new profile. I kind of get the feeling you want to use the new isolated AiiDA folder instances in lieu of AiiDA profiles, in a sense?
I suppose it then boils down to what you prefer. I like being able to have a Jupyter notebook anywhere and select the kernel to be able to interact with different AiiDA instances (old skool definition), and then load the profile whose dataset I'm interested in. Having to copy or move the notebook to interact with a different instance seems tedious.
Besides the .git analogy, I think it might be useful to think about this in terms of managing Python environments. Right now, by default aiida behaves like pip, having an ~/.aiida folder that's global to the user. What this PR proposes is more akin to the venv model.
Yes, I see, interesting point. Though I think most Python environment tools allow you to execute commands with that environment independently of which folder you are in. I suppose what I was loosely suggesting:
I wondered if somehow the configuration couldn't be automatically tied to the environment, e.g. by having a separate subfolder in the .aiida folder based on the path to the Python binary or something.
Is sort of similar to virtualenvwrapper
or conda
, that keep the environment files in subdirectories in one global directory. Alternatively, we can store the .aiida
folder in the virtual environment directory by default. I'll think about making this all more concrete.
I'm still more in favor of not separating the AiiDA config from the AiiDA installation/Python environment though. Another argument is that the config format is actually dependent on the aiida-core
version, and automatically migrated. I can already imagine I loaded a certain Python environment and accidentally run verdi status
in the wrong folder, automatically migrating the config in the .aiida
folder it finds and leading to much chagrin.
PS: we're actually meeting on this topic at 9:30 CEST today (so in less than 7 hours ^^). I'll send you the Zoom details in case you want to join.
There is no
verdi delete
command. Do you meanverdi profile delete
? The whole point ofverdi init
is that you don't need any verdi commands to clean up, you just delete the.aiida
folder withrm
and that's it. Or do I misunderstand your point?
Yes yes I understand, but still, even if I want to verdi profile delete
, it should behave as the same, no?
Right now, the folder remains undeleted, then if I do verdi init
again, I get the error that the configuration directory .aiida
already exists. So then, one has to delete manually.
What I mean, is that I expected it behave the same both ways. Either deleting the folder manually or verdi profile delete init
Would also delete configuration files of "init"
Yes yes I understand, but still, even if I want to verdi profile delete, it should behave as the same, no?
But that is the thing, verdi profile delete
deletes a profile not the config directory. There never was a command to even think of deleting the config directory.
From @khsrali's comment above you can already see that this new approach is causing confusion.
The problem is that git init
doesn't just create a new profile. It creates a new AiiDA "instance" (new definition, let's call it configuration from now on) that itself can have multiple profiles. AiiDA can't delete the configuration files just because one profile that is in that configuration is deleted.
I'm writing a small essay to continue from our discussion last Friday, but already wanted to note this before I forget:
$ verdi init Sun Apr 14 09:49:28 2024
Success: Initialized new AiiDA instance in `/Users/mbercx/project/tmp/proj/.aiida`.
Report: Initialising the storage backend.
Report: Storage initialisation completed.
Success: Created new profile `init`.
Success: Configured the localhost as a computer.
$ verdi status 845ms Sun Apr 14 09:49:31 2024
✔ version: AiiDA v2.5.1.post0
✔ config: /Users/mbercx/project/tmp/proj/.aiida
✔ profile: init
✔ storage: SqliteDosStorage[/Users/mbercx/.aiida/repository/sqlite_dos_ca244afbb1164449907ffda242122f12]: open,
⏺ broker: No broker defined for this profile: certain functionality not available.
⏺ daemon: No broker defined for this profile: daemon is not available.
It seems the SqliteDosStorage
is still using the .aiida
folder in $HOME
?
After our meeting on Friday, I took a bit of time to write down the discussion and supplement it with some further thoughts.
Buckle up and grab a fresh cup of ☕, because this is going to be a long one.
Semantics
I'll start with some semantics again, to limit confusion in the presentation and the discussion that follows. I will not define an AiiDA "instance", since this definition is exactly what is up for debate. For now, I will define[^1]:
[^1]: These are my definitions based on my current knowledge and understanding, and probably can be made more precise/complete. Suggestions welcome, we might even use it to start a glossary.
-
AiiDA installation: This is the source code, installed by a package manager so the
verdi
entry point is available from the command line. -
The
.aiida
directory/folder: I've called this the "configuration" before, but it actually contains more than just configuration at this point:-
config.json
: The actual configuration, both of the AiiDA installation and the AiiDA profiles (see below). - daemon-related configuration files and logs for each profile.
- The default location for repositories.
- The default location for SQLite databases of the
sqlite_dos
storage.
-
-
AiiDA profile: each AiiDA profile is an isolated combination of an AiiDA storage and daemon.
The image below tries to visualise and connect these concepts:
data:image/s3,"s3://crabby-images/ed63e/ed63e21ce93d894d388f887317a1a75aad66ba59" alt="drawing"
Introduction
As @GeigerJ2 mentioned, the verdi init
command (and commit https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe which it relies on) tries to tackle several issues:
- Quick setup of a profile that requires no services.
- Having a contained AiiDA "instance" that is easy to remove completely.
- Better separation of AiiDA "instances".
(1) would be fully satisfied by "verdi blitz
" (https://github.com/aiidateam/aiida-core/pull/6305), and (2) mostly as well. Doing verdi profile delete
will remove the storage associated with the profile, but not the daemon-related files. This could be changed though, but let's not go into that here.
(3) is the main issue I want to discuss, i.e. the changes made in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe. I could also post my essay there, but I think verdi init
and its usage advocated here helps to illustrate the consequences of the change, and why I'm against it.
The Status Quo
The current definition of an AiiDA "instance" is straight from the documentation:
An AiiDA instance is defined as the installed source code plus the configuration folder that stores the configuration files with all the configured profiles.
It continues:
It is possible to run multiple AiiDA instances on a single machine, simply by isolating the code and configuration in a virtual environment.
Note how it mentions to store the configuration in a virtual environment. Next it mentions using Python environments, which I'm sure we all agree with:
To isolate the code, make sure to install AiiDA into a virtual environment, e.g., with conda or venv, as described here. Whenever you activate this particular environment, you will be running the particular version of AiiDA (and all the plugins) that you installed specifically for it.
Then comes the tricky bit:
This is separate from the configuration of AiiDA, which is stored in the configuration directory which is always named
.aiida
and by default is stored in the home directory. Therefore, the default path of the configuration directory is~/.aiida
. By default, each AiiDA instance (each installation) will store associated profiles in this folder.
This is the current problem with separating AiiDA instances. By default, the .aiida
folder is not isolated like the Python environment is. The documentation does indicate best practises on how to deal with this:
A best practice is to always separate the profiles together with the code to which they belong. The typical approach is to place the configuration folder in the virtual environment itself and have it automatically selected whenever the environment is activated.
But this requires setting the AIIDA_PATH
environment variable, which is not trivial, environment-manager dependent and frankly: tedious.
git
-like .aiida
discovery
The changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe proposed a solution to this problem. It inserts an extra step in the resolution to the .aiida
folder:
- The
AIIDA_PATH
variable, if set. -
git
-like.aiida
discovery - The
$HOME/.aiida
folder
Basically, if AIIDA_PATH
is not set, AiiDA will now go up the hierarchy of the current working directory and look for the first instance of .aiida
and use that.
At first glance the change seems innocuous and reasonable. I liked it too when I first saw the PR. But it has some pretty profound implications, especially when we combine it with verdi init
and how it will be used.
As I've mentioned, it radically redefines what an AiiDA instance is:
An AiiDA instance is defined by its configuration directory, which is always named
.aiida
.
And allows for multiple .aiida
directories per AiiDA installation[^2]. Next, its usage in verdi init
encourages multiple .aiida
directories:
[^2]: Note that in principle it is also possible to have multiple .aiida
folders for one AiiDA installation in the status quo. The user could change the AIIDA_PATH
variable if they wanted to. But it's a use case we clearly don't recommend, and so advanced that I'd be surprised if anyone has done it in practise.
- any beginning user that installs AiiDA for the first time in a fresh Python environment will have two
.aiida
folders after runningverdi init
, and will interact with one of them depending on where they executeverdi
or the location of the Python script they are running. - there are already multiple instances in the comments above where users would want to create another profile, but this would fail with
verdi init
because the.aiida
folder already exists (in the current working directory or a parent) /AIIDA_PATH
is set. It seems we'd be recommending them to create a new directory and runverdi init
again.
So, after this is released, we can fully expect that users will have multiple .aiida
directories per AiiDA installation, that each can have multiple AiiDA profiles.
The Problem
Great, so what's the problem? I've tried to gather my thoughts here as best as I could, and also added some more examples of problems a user could run into because of this change at the end.
It makes AiiDA more complex
The fact that we now can have multiple .aiida
folders per install that each have multiple profiles adds another layer of complexity to the AiiDA setup, which is exactly what we want to avoid.
Let's say we adopt the new definition of an AiiDA instance as a .aiida
directory. What's the difference between an AiiDA instance and profile? Why do we need both? I wouldn't be able to answer this question to be honest. In fact it seems like this new approach of separating AiiDA instances is usurping the concept of an AiiDA profile, and it's already clear from this thread that they are being used interchangeably (incorrectly so!). This is problematic for several reasons:
- New users will probably quite quickly try using
verdi init
to create a second profile (e.g. if they realise they want/need RabbitMQ). This will fail or not, depending on the behaviour we choose here. If it does, they won't understand why, if it doesn't and create a new profile, whatverdi init
does will change fundamentally, and they'll have to understand the differences in how to interact with instances and profiles (see below). - Existing users will most likely be surprised by the behaviour of
verdi init
. They will assume that they've just created another profile they can access from anywhere with verdi, and be wondering "where is my profile" if they aren't in the correct folder. You could say thatverdi init
isn't for them, but then we don't solve the issue of isolating AiiDA instances. - All users will have to deal with both AiiDA "instances" and profiles, which have different modi operandi. Instances have git-like discovery, whereas profiles are configured in the
.aiida
folder and "loaded" through the Python API. Moreover, to get to the correct profile, you'll have to both (1) be in the correct directory and (2) still load the profile somehow. Users will rightfully ask why. Deleting a profile usesverdi profile delete
, but for an instance you have to remove the folder.
To me it seems clear that having both multiple .aiida
instances and profiles will cause quite a bit of confusion. One might argue that we can fully move to git-like discovery "instances" in lieu of profiles. But I see other issues with this:
- It's a massive backwards-incompatible change in behaviour which would definitely require an AEP and probably a major release.
- The AiiDA installation is not aware of the various
.aiida
instances associated with it, which is true for profiles. - Having a directory-based approach for interacting with certain instances is natural for CLI tools, but not so much for Python code. The location of the Python script/Jupyter notebook does not influence which Python instance I'm interacting with, rather it's the Python binary I'm calling or kernel I've loaded in my Jupyter notebook.
- It's more challenging to interact with that instance because you need to be in a specific folder hierarchy. Like interacting with Python instances, the current approach allows you to execute
/path/to/verdi -p profile
to e.g. back up a certain profile in a cronjob. With git-like discovery you'll have to make sure any script command that interacts with a certain instance does so in that folder (or somehow tweaks the$PWD
).
It doesn't fully deal with the issues it's trying to solve
Quick setup of a profile that requires no services.
As @superstar54 rightfully mentions, an advanced user might still want to be able to set up a profile quickly without having to deal with the complexity of verdi profile setup
(I for one have rediscovered a new-found love for verdi quicksetup
). However, anyone that actually uses AIIDA_PATH
, can't use verdi init
. So I guess we'd still need another command, which ropes back into @GeigerJ2's point that we shouldn't have too many commands for creating a profile.
Having a contained AiiDA "instance" that easy to remove completely.
The idea of having a fully contained AiiDA instance within a folder only works for the SQLite database. So if we would extend verdi init
to Postgres-type databases the user can't simply remove the folder to purge an AiiDA "instance". And if we start promoting verdi init
, you can be sure that users will expect instances/profiles to be fully contained in that folder.
Profiles on the other hand can be fully removed with verdi profile delete
(or they should be).
Better separation of AiiDA "instances".
As mentioned above, anyone that doesn't use git init
to separate AiiDA instances, still has to deal with the same problem. Now, you could argue they can just make a .aiida
folder, or we provide some kind of command for that. But really, we are making things harder for the user than they have to be to solve something that AiiDA should take care of automatically.
Some more examples of potential issues
- Separating the AiiDA installation from its configuration can have other downsides. The AiiDA installation determines the version of the configuration file. If AiiDA finds that the configuration file is outdated, it will automatically upgrade it (note that this doesn't work the other way!). This is desired behaviour, since we want configuration file format updates to be seamless for the user. However, with the changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe, if I run the
verdi
command of the wrong Python environment in the scope of another.aiida
directory, it will either give an error or (worse) automatically migrate my configuration. - If you run a
git
command outside the directory scope of a.git
directory,git
will raise an error. This is not true for AiiDA. It's why Sebastiaan decided to not allowverdi init some/path
as described here. This is problematic because it can be easy to forget you have to be in the correct folder scope (and Python environment) to interact with the AiiDA instance, and it won't be immediately clear that you are not. - A user downloads a Jupyter notebook and opens it. Normally they'd just select their Python kernel (VScode makes this quite easy) and then load the profile. If they are working with
.aiida
instances though, they have to move the notebook in the right folder first. And then they still have to load the kernel and profile.
Alternative solutions
Tie the AiiDA install to the Python instance
However, as @danielhollas mentioned, the status quo is not acceptable either. I'm currently working on AEP + PR that ties the .aiida
directory to the corresponding Python instance. Effectively, each an "AiiDA instance" will once again be the combination of the AiiDA installation + .aiida
directory, but the latter will be automatically isolated for each Python instance.
Folder scope for profiles
I do like the concept of git-like folder-based separation in general, and I understand some users might prefer it. What we could do is create a "git-folder-like" profile. Basically, instead of changing the default .aiida
configuration folder, you'd change the default profile depending on where the code is executed.
Some questions that came up
- Why does
AIIDA_PATH
allow for multiple semicolon-separated directories? What's the use case here?
TL; DR
- While I understand the appeal of having
git
-like.aiida
directory discovery, I think it's a sub-optimal solution to the problem of separating AiiDA instances. - The changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe should be reverted in favor of a solution that doesn't decouple the
.aiida
folder from the AiiDA installation. - In any case, any solution to the "separation of instances" problem is so impactful that it should require an AEP to properly motivate the chosen design.
Thanks for the very detailed writeup @mbercx . I have read through it and agree with most things and as I mentioned during our last discussion I am leaning to agreeing with you that we should probably revert https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe . The idea was nice, but with the current status quo it introduces too many situations where the behavior of AiiDA or the "best" approach to things become unclear.
That being said, here are some short comments I wanted to make:
- Even though the AiiDA documentation might currently define an "instance" as the union of a Python environment + the
.aiida
directory, I am not sure that it should be. I agree that in most cases it would probably be useful to tie them together if multiple Python environments are available or even needed, but it should not be a requirement. For many simple use cases, having a single Python environment would be just fine and so there would be no need for isolating. This is a more advanced use case and for that we have a simple way of accomplishing that (I don't think requiring a user to know how to set an environment variable is too much too ask). I would strongly caution about adding logic to AiiDA to start handling the Python environments as this famously ridden with complicacies. There is no standard and as a result there are a million things of doing things depending on the OS and preferences of the user. For example, think even of a docker container where you wouldn't even need a virtual environment to install things, and I wouldn't AiiDA to start forcing users to use it. -
"Why does AIIDA_PATH allow for multiple semicolon-separated directories? What's the use case here?" This was a design choice made @greschd but I don't remember why either. I also don't really see the point and it does make the behavior and implementation more complicated. If it weren't for backwards compatibility, I would have removed it and only accept a single path that actually includes the
.aiida
in the path. - Finally as to the real goal of
verdi init
: For me the main goal is to make it dead-easy to get a working Python environment for users that just want to try things out. This should require no special options, no prompts, no nothing. As soon as a user wants something more specific, it is perfectly reasonable to ask them to understand a bit more about what it is exactly what they want and to specifiy this through more specific commands with prompts/flags. I also want to guarantee that for power users there is an interface that gives full control and not have the interface limit anything. I think that havingverdi init
that only creates acore.sqlite_dos
no-broker profile, combined withverdi profile setup
for full control is perfectly fine. The only question for me is what to do with the special case ofverdi quicksetup
that provides custom logic just for thecore.psql_dos
backend.
@giovannipizzi @danielhollas the ball is now in your court I believe. The other people that were present during the discussion seem to be largely onboard with the analysis presented above and to remove the directory-based discovery of the config directory. Could you please let us know if you agree or if you have counter arguments that you think deserve to be discussed first. Ideally I would like to take a discussion soon because we would really like a verdi init
-like command to easily create a no-services profile and release it with the upcoming v2.6.
Thanks to @giovannipizzi, @sphuber, @danielhollas, @GeigerJ2 and @khsrali for the discussion! I'm going to summarize some points here:
- We all agree that the change in
.aiida
folder discovery behavior in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe is substantial enough that it requires more discussion (preferably through an AEP) and the changes should be reverted for now. - For the next release we want to still provide a command for new users to get up and running as quickly as possible. By default it should need no options, which will set up a new profile with the
sqlite_dos
backend and no RabbitMQ. This effectively means we will in the end go back to something likeverdi blitz
(https://github.com/aiidateam/aiida-core/pull/6305). The one decision we still have to agree on is the name of the command (init
,presto
,quickstart
,light
, ...), but we can discuss and vote on this in the PR. One important point is that the profile should only be set as the default if there isn't already a default, to match the behavior ofverdi profile setup
and avoid messing up running scripts that might rely on the default not changing (although best practise should be to explicitly set the profile in such scripts). - This does create the problem that we'd not have four commands to set up a profile. Next to the one in the previous point, we'd have:
-
verdi profile setup
: This command offers sub-commands for the various backends and gives full control. We keep this all cases. -
verdi setup
: This command is effectively replaced byverdi profile setup
, and should be deprecated and removed in the next major release. -
verdi quicksetup
: This offers a much quicker way for users to set up a new profile, but only apsql_dos
one with the RabbitMQ broker. Changing the behavior to set up the barebones one would be backwards incompatible. Still, the plan would be to deprecate it in lieu of:
-
- To make easy profile setup still possible for all profile types, we can extend the
verdi init
command with optional flags as per @GeigerJ2's suggestion. This would require some discussion on what the flags should be, but that discussion can be had at a later date.
Action points:
- @sphuber will work on reverting https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe and switching back to the
verdi blitz
option. - I will work on an AEP on my preferred definition of an AiiDA instance and how to separate them. (hopefully later this week, potentially the next due to scientific duties).
- We should still consider alternative approaches to how the default profile is specified:
- An environment variable (e.g.
AIIDA_PROFILE
) -
git
-like directory discovery, i.e. in case you are in a folder that contains a certain file in the hierarchy (e.g.aiida_profile.cfg
.
- An environment variable (e.g.
Also thanks @sphuber for your comments here. I'll definitely consider/integrate the first in my AEP.