aiida-core CLI: Add the `verdi init` command

This is an alternative solution to #6305 The verdi init command is slightly different in that it doesn't just create a new profile, but it creates the profile in a new AiiDA instance, i.e., a new .aiida folder is created (in the current working directory by default). This makes the behavior similar to git init that initializes a new repo. Once the user is done, the instance folder can simply be removed and there will be no trace left. By default the command will create a profile with the core.sqlite_dos which won't require any services, but alternatively a profile with core.sqlite_zip can be created instead from an existing archive. This version will be read-only, but will allow a user to easily start browsing an archive with a single command verdi init --from-archive some_archive.aiida.

Mar 14 '24 23:03 sphuber

Note, this currently would rely on #6303 and some other internal changes that are part of #6305 but the general concept is here. We should discuss exactly what we want the interface and behavior of verdi init to be.

Mar 14 '24 23:03 sphuber

There are still some questions about desired behavior of the command:

What happens when verdi init is run in a directory that already contains an .aiida folder. Currently it just aborts with an error.
If we decide that in the above case, instead the existing folder is used, what happens when the init profile already exists? Just error and stop? Allow to create another profile automatically choosing another profile name? Even prompt the user for that name?

Mar 21 '24 12:03 sphuber

There are still some questions about desired behavior of the command

I think the current behaviour is fine, after all the user can simple change a directory, right? Are there any use cases where this would potentially be undesirable?

I fear anything we'd do otherwise would be confusing / unintuitive. E.g. if the init profile is created next to an existing profile, should it still be made default?

Mar 21 '24 14:03 danielhollas

I think the current behaviour is fine, after all the user can simple change a directory, right? Are there any use cases where this would potentially be undesirable?

I agree. However, @giovannipizzi suggested that the current syntax for creating a new profile in an existing instance, e.g. verdi profile setup core.sqlite_dos, is still too complicated. Users might be tempted to try and run verdi init to create another profile in the same instance. I think this becomes too complicated though and there is little benefit. If a user does in fact want to start using multiple profiles in the same instance, they can learn how to use verdi profile setup. Still I wanted to open this up for discussion in case others have different intuitions on what is intuitive UX.

Mar 21 '24 18:03 sphuber

I have addressed the comments. Does anyone still have major comments? Otherwise I propose we merge this so we can start dogfooding this. We will wait with the release for a while so we can still further fine tune it. I will also start working on improving the docs to include this new method of getting started with AiiDA in an easy way. But will leave that for a separate PR.

Mar 22 '24 13:03 sphuber

Hi @sphuber, I am not clear how to use this feature. Suppose I create a folder and run verdi init. Then, how do I use this newly created aiida folder?

how does the verdi command know that I want to work on this folder? What if I change to another aiida folder?
how do I load the profile in python script?
what's the relation between a local aiida folder and the global aiida configuration?

Maybe this will be clear in the docs PR?

Mar 22 '24 13:03 superstar54

how does the verdi command know that I want to work on this folder? What if I change to another aiida folder?

With #6322 AiiDA will now check the current working directory for an .aiida folder. It now works essentially as git. So you can switch an AiiDA instance by changing the working directory. When you run verdi init it will create a new .aiida folder in the current working directory (just like git init) and so it will immediately use that. Give it a try:

verdi init
verdi status
verdi profile list
verdi profile show

You should see that it now uses the local .aiida folder and it created a new profile called init.

how do I load the profile in python script?

Exactly as normal with load_profile or use verdi run or the runaiida executable. Nothing changes. If you open a shell, it will pick up the new instance and profile

(aiida-py311) sph@invader:~/code/aiida/env/dev/aiida-core$ verdi init
Success: Initialized new AiiDA instance in `/home/sph/code/aiida/env/dev/aiida-core/.aiida`.
Report: Initialising the storage backend.
Report: Storage initialisation completed.
Success: Created new profile `init`.
Success: Configured the localhost as a computer.
(aiida-py311) sph@invader:~/code/aiida/env/dev/aiida-core$ verdi status
 ✔ version:     AiiDA v2.5.1.post0
 ✔ config:      /home/sph/code/aiida/env/dev/aiida-core/.aiida
 ✔ profile:     init
 ✔ storage:     SqliteDosStorage[/home/sph/.aiida/repository/sqlite_dos_2057f645d8a544109563f725831dc589]: open,
 ⏺ broker:      No broker defined for this profile: certain functionality not available.
 ⏺ daemon:      No broker defined for this profile: daemon is not available.
(aiida-py311) sph@invader:~/code/aiida/env/dev/aiida-core$ verdi shell
Python 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from aiida import load_profile

In [2]: load_profile()
Out[2]: Profile<uuid='5afa7003e9af44ca955df40c5dc56c66' name='init'>

what's the relation between a local aiida folder and the global aiida configuration?

They define separate AiiDA instances that are completely isolated. See the docs: https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/installation.html#isolating-multiple-instances

Mar 22 '24 13:03 sphuber

I got a problem

(aiida) xing@thinkpad:~/tests/aiida/test-6315$ verdi init
verdi status
verdi profile list
verdi profile show
Success: Initialized new AiiDA instance in `/home/xing/tests/aiida/test-6315/.aiida`.
Report: Initialising the storage backend.
Report: Storage initialisation completed.
Success: Created new profile `init`.
Success: Configured the localhost as a computer.
 ✔ version:     AiiDA v2.5.1.post0
 ✔ config:      /home/xing/.aiida/envs/aiida/.aiida
 ✔ profile:     default

the config still in ✔ config: /home/xing/.aiida/envs/aiida/.aiida. This probably because I using a conda environment, and I set $AiiDA_PATH

$ echo $AIIDA_PATH
/home/xing/.aiida/envs/aiida

thus, the AiiDA_PATH from #6322 is overrided.

Mar 22 '24 13:03 superstar54

thus, the AiiDA_PATH from #6322 is overrided.

You are right. The verdi init command takes an argument [DIRECTORY] (which defaults to current working directory) where it will create the .aiida folder. This currently does not respect the AIIDA_PATH.

The question now is what the behavior should be. This command is intended for beginning users of AiiDA, so they are unlikely to have defined AIIDA_PATH. Users who specify AIIDA_PATH will most likely also already have .aiida folders and won't really need verdi init. I guess we can simply have verdi init respect the AIIDA_PATH variable. If the user specifies an explicit directory in the command, e.g., verdi init some/path and AIIDA_PATH is defined, I guess we should error in that case?

Actually, should we even allow to specify the path? git allows this, which is why I added it, but it doesn't seem to make much sense. If you do

verdi init some/path
verdi status

The second command will not actually work against the just created instance. I think I will remove it and just get the path from AIIDA_PATH defaulting to current working directory if not defined.

Mar 22 '24 14:03 sphuber

I guess we can simply have verdi init respect the AIIDA_PATH variable.

We need to think about how to handle the global verdi and local verdi. As a user, even though I set the AIIDA_PATH, I still want to use verdi init to create a local AiiDA folder. In this case, the local folder should be higher priority than the global AIIDA_PATH.

btw, this is not a problem for git, since it does not have a global config.

Mar 22 '24 14:03 superstar54

We need to think about how to handle the global verdi and local verdi. As a user, even though I set the AIIDA_PATH, I still want to use verdi init to create a local AiiDA folder. In this case, the local folder should be higher priority than the global AIIDA_PATH.

Fair point. That means the logic of #6322 should be inverted. That the current working directory takes precendence over AIIDA_PATH. Does that have unexpected consequences though? If a user relies on AIIDA_PATH but then happens to change directories to a path that contains an .aiida folder in the hierarchy, then all of a sudden, the AIIDA_PATH is ignored and the instance is switched.

Mar 22 '24 14:03 sphuber

If a user relies on AIIDA_PATH but then happens to change directories to a path that contains an .aiida folder in the hierarchy, then all of a sudden, the AIIDA_PATH is ignored and the instance is switched.

yes, but I think it's not [Edit] a problem, but an expected behavior. The user want to change the AiiDA config when going to a local AiiDA folder and its child folders.

Mar 22 '24 14:03 superstar54

I am also leaning towards local .aiida having preference, seems more intuitive.

Another perhaps more general question, how does this interact with verdi daemon? Can there be more isolated daemons running? Or is there a one global instance that can handle different .aiida folders?

Mar 23 '24 09:03 danielhollas

I am also leaning towards local .aiida having preference, seems more intuitive.

It is fine by me to change to this behavior, I am just a bit worried about breaking behavior. People that have been relying on AIIDA_PATH could all of a sudden accidentally target the wrong instance if they change to the "wrong" directory. If more people this think risk is acceptable, I am happy to change the precedence rules.

Another perhaps more general question, how does this interact with verdi daemon? Can there be more isolated daemons running? Or is there a one global instance that can handle different .aiida folders?

Currently, you can have multiple instances running alongside one another just fine. Each daemon is perfectly isolated. The AIIDA_PROFILE variable is inherited by the process daemon as you start it. We should just check that the same goes for the current working directory, such as to ensure the started daemon process also has this working directory. Otherwise, we would have to manually forward it by setting the AIIDA_PATH to the CWD. Should be easy to test.

Mar 24 '24 07:03 sphuber

It is not so trivial to just reverse the precedence of CWD and AIIDA_PATH to determining the config dir. As soon as a config dir is created in the users home folder, that will almost always take precedence over AIIDA_PATH since one is almost always in a directory whose parent is the user's home folder. Since currently the config directory is created by default in ~/.aiida I am afraid that AIIDA_PATH will essentially become useless. Not quite sure how to fix this.

Removing the automatic creation of ~/.aiida would require quite some changes. Also not sure what to replace it with. All verdi commands and Python API rely on the directory automatically being created. verdi init is really the first exception.

Leaves perhaps to ignore ~/.aiida when checking the CWD. So check CWD with parent folders, but excluding the home directory, then check AIIDA_PATH and only when still not found default to ~/.aiida. This makes the logic even more complicated.

Hypothetically, even if we disregard backwards compatibility just for the time being, what would the ideal most intuitive behavior be? Seems that having the "global" config in the home directory may be complicating things, but if we get rid of it, it will force users to always manually run something like verdi init to initialize a config somewhere. Also explaining to users that not doing so in the home folder may be beneficial if they ever plan on running multiple instances, is going to be way too complicated.

So, thoughts?

Mar 25 '24 09:03 sphuber

My though:

If both the home/.aiida directory and AIIDA_PATH exit, it raises an error for the user.
AIIDA_PATH is most probably used by advanced users. In the docs, we can suggest users remove the home/.aiida if they really want to define their own AIIDA_PATH, e.g., in the conda env.

Mar 25 '24 17:03 superstar54

Hmm, tricky. I don't think erroring out is very user-friendly, rather we should have a predictable behaviour.

Leaves perhaps to ignore ~/.aiida when checking the CWD. So check CWD with parent folders, but excluding the home directory, then check AIIDA_PATH and only when still not found default to ~/.aiida. This makes the logic even more complicated.

Tbh although this feels hacky, it seems like a best option for now and ia well defined.

I assume that most (beginner?) users don't use AIIDA_PATH, so this will not affect the behaviour for most, and at the same time we will not break advanced users using AIIDA_PATH. If they are already customizing their aiida path, it would seem they will have less incentive to use the new init command anyway.

Thinking in abstract, Essentially we have similar situation as in python, i.e. using venv versus global install. So we could potentially have some kind activation script in the future. But that seems like an extra complication for now. Let's dogfood it for a while and see?

Mar 25 '24 18:03 danielhollas

Hi all, very interesting feature and discussion! For the use case of quickly exploring an archive with a temporary profile without polluting the main one, this works perfectly, and I love the idea!

To me:

Leaves perhaps to ignore ~/.aiida when checking the CWD. So check CWD with parent folders, but excluding the home directory, then check AIIDA_PATH and only when still not found default to ~/.aiida. This makes the logic even more complicated.

also seems like the best solution to this conundrum so far. I learned about AIIDA_PATH rather late into my AiiDA journey, so I also wouldn't consider it a beginner feature, though, that's of course only anecdotal.

Some other points:

If we have the --from-archive option, should we also allow passing a URL, similar to verdi archive import?
One could also provide the option to directly import it, e.g. something like: verdi init --from-archive <local archive/URL> --import (assuming #6316 is fixed)?

But:

verdi init
verdi archive import <local archive/URL>

should be simple enough, really...

Though, there are a few more thoughts that I'd like to bring up here, which I also mentioned briefly in the last aiidateam meeting:

As verdi init fundamentally creates a profile with one of the SQLite backends and without RabbitMQ, one cannot create a full production environment with the command. I know that here, this is basically the point, but if we introduce it, people might start using it over the other options like verdi quicksetup and verdi profile setup for convenience, especially if they are familiar with git init. However, they might then hit a roadblock if they want to actually continue with this profile (similar to what @superstar54 asked about being able to switch the database later on).

In the end, in this PR, I think we are trying to achieve two separate goals:

Allowing for quick and easy setup of a profile without the services (using SQLite and no RabbitMQ)
Having a local, self-contained AiiDA instance similar to when running git init (which leaves no trace when being deleted), that consists of two parts:
1. Using the local .aiida config, and
2. Using a local database (again, SQLite)

(And maybe 3: Having a profile for just exploring an archive without importing - though, that seems to tie in with point 2, just using zip rather than disk-objectstore)

One could add more customization options to verdi init, however, this seems like we'd just gradually convert it back into the other setup commands (as would be the case if we would add more options to verdi quicksetup, e.g. to allow other database backends). Also, many of the command line arguments are the same between the different commands.

So I'm wondering if, rather than now adding verdi init on top, we could just provide one (or maybe two) verdi commands for setting things up (quicksetup or init - I'd vouch for the latter), but then provide different turn-key/convenience setup options via double-dash command line flags. In this scenario, the initialization command by default automatically tries to create the "best" setup available on the system, with as little required user input as possible (at least that would be my personal preference), while users should still be able to overwrite individual options. For each of the turn-key options, we can provide a short and concise overview in the docs with the intended use case and strengths and weaknesses (similar to the one for the storage backends).

For example:

verdi init by default could automatically try to use PostgreSQL and RabbitMQ, resorting to SQLite and/or no broker if these fail, and write to the global config.
verdi init --production could explicitly only work if both services are set up properly.
verdi init --local (or verdi init --contained) could use SQLite and RabbitMQ or no broker, again depending on availability, with a local config (similar to what verdi init of this PR is doing).
verdi init --no-services could explicitly be used to set up a profile with SQLite and without RabbitMQ (writing to the global or local config being arbitrary in that case).
verdi init --full could ask for all configuration options (like verdi setup right now does), not just the essential ones.

The --non-interactive flag would still function as it does now in all cases. Writing to the local and global config could also generally be added as command line options via --global/--local again mirroring how git behaves.

Effectively, it all boils down to where we put the logic/complexity for the different possible ways to set up an AiiDA instance: As separate verdi commands as it is done right now, or as one verdi command, but then with the different command line options, as I'm proposing here. As removing stuff will be backwards incompatible, any clean-up would go into v3 release, I suppose, but then I would only introduce verdi init now, if it is here to stay.

I think all of this will be good to discuss as part of the upcoming aiida-core coding days.

Apr 09 '24 09:04 GeigerJ2

Thanks a lot @sphuber , verdi init is great. two minor things: I would update this message: --appears if verdi status without a profile--

verdi status
Report: Configure a profile by running `verdi quicksetup` or `verdi setup`.

I would have expected that after verdi delete , the local .aiida directory get deleted -the one in local path not the ~/.aiida-. Is there a reason for the current design?

Apr 11 '24 16:04 khsrali

I would update this message: --appears if verdi status without a profile--

Totally agree. Was just awaiting the discussion on what we will do with the verdi quicksetup/setup commands. But most likely it will have to be updated.

I would have expected that after verdi delete , the local .aiida directory get deleted -the one in local path not the ~/.aiida-. Is there a reason for the current design?

There is no verdi delete command. Do you mean verdi profile delete? The whole point of verdi init is that you don't need any verdi commands to clean up, you just delete the .aiida folder with rm and that's it. Or do I misunderstand your point?

Apr 11 '24 20:04 sphuber

Before starting with proper dogfooding, I already have some conceptual comments I wanted to make, mostly on https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe.

Let's start with some semantics: What exactly is an AiiDA instance? Interestingly, the definition of this concept was sublty changed in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe. Before, the documentation said:

An AiiDA instance is defined as the installed source code plus the configuration folder that stores the configuration files with all the configured profiles.

This is changed in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe:

An AiiDA instance is defined by its configuration directory, which is always named .aiida.

So, the source code is no longer part of the definition of an AiiDA "instance". Indeed, the changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe allow for multiple AiiDA configuration folders ("instances" in the new definition) for one Python environment and AIIDA installation.

I understand the problem the commit is trying to solve. Once a user starts having multiple AiiDA installations, they needed to use AIIDA_PATH to properly separate them, else they'd run into all sorts of trouble. This is exactly why I started working on aiida-project (Still very much a work in progress; I wish I could dedicate more time to it). It respects the previous definition of an AiiDA instance: the lines to set the AIIDA_PATH environment variable are automatically added to the activate script of the virtual environment. Having to write our own environment management tool did seem a bit much, admittedly[^1]. I wondered if somehow the configuration couldn't be automatically tied to the environment, e.g. by having a separate subfolder in the .aiida folder based on the path to the Python binary or something. I'm curious how other Python packages that have configurations deal with this.

However, I'm not sure if the new definition of an AiiDA instance makes sense to me. Does it make sense to have multiple AiiDA instances for one Python environment/AiiDA installation? What advantages do multiple separate instances have over multiple profiles in the same instance? If not, perhaps we should look for an alternative approach to properly separate the "old" version of AiiDA instances (code and configuration) instead. I'm already worried about a few things:

Having another layer of "instances" might confuse things. One can already have multiple AiiDA installs in separate environments, and separate profiles for each install. Adding a layer in between for multiple AiiDA configurations only makes sense if it adds something.
There have already been quite some discussions on which configuration should take precedence. I would argue that AIIDA_PATH should remain king (git also gives GIT_DIR precedence, by the way). If the issue is complex to resolve for us, it's bound to lead to confusion for users. One AiiDA install has one configuration is simple to understand.

So, I'm mainly wondering if we should not revert https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe in favor of a different approach to separating AiiDA instances that respects the original definition: AiiDA source code/install and configuration.

[^1]: Although it could have additional use cases. One thing that has been on my TODO list for a while is providing conda support for aiida-project, and then automatically installing the services and activating them inside the environment when using the cda command, which aiida-project uses to switch AiiDA "projects" (instances in the "old" definition). It could then probably also automatically configure the RabbitMQ timeout etc. We could also use it as a sort of aiidalab-launch for AiiDA, i.e. also support "Docker" projects.

Apr 11 '24 21:04 mbercx

@mbercx some really insightful comments! The fact the this PR changes the semantics (not only behaviour) is a really great observation.

But I would like to turn this around back at you. :-) The fact that you needed to start aiida-project is a good indicator that the status quo is not fully satisfactory. For such a critical thing as project management, I'd much rather for it to be solved in aiida-core rather than an external package.

I wondered if somehow the configuration couldn't be automatically tied to the environment, e.g. by having a separate subfolder in the .aiida folder based on the path to the Python binary or something

What advantages do multiple separate instances have over multiple profiles in the same instance?

I think this is a key question to answer. I wonder if/how people use different profiles? I don't have too much experience with them, but to me the concept is very opaque. At least from the command line, working with different profiles is kind of awkward, since you need to always provide an extra argument for non-default profiles, right?

I would argue that AIIDA_PATH should remain king (git also gives GIT_DIR precedence, by the way).

I guess I agree, with the caveat that with should discourage users to muck around with AIIDA_PATH and prefer .aiida path-based git-like discovery if we go forward with this.

To me the git-like model makes things quite a bit easier to wrap my head around. Different projects are simply different directories. If you for some reason need different aiida installation for different projects, you could put Python venv (.venv) alongside .aiida folder.

Besides the .git analogy, I think it might be useful to think about this in terms of managing Python environments. Right now, by default aiida behaves like pip, having an ~/.aiida folder that's global to the user. What this PR proposes is more akin to the venv model.

Apr 11 '24 23:04 danielhollas

But I would like to turn this around back at you. :-) The fact that you needed to start aiida-project is a good indicator that the status quo is not fully satisfactory. For such a critical thing as project management, I'd much rather for it to be solved in aiida-core rather than an external package.

aiida-project also does a bunch of other stuff in a way that I like (happy to discuss some time ^^), but I agree it would be better to deal with the proper separation of instances in aiida-core. I hope I didn't give the impression that I was arguing to change nothing. 😱

I wonder if/how people use different profiles? I don't have too much experience with them, but to me the concept is very opaque.

To me, different profiles allow me to have multiple storages in (what I call) the same project. A typical use case would be a dev profile where I mess around with things, and a nice and clean prod profile for the real production runs.

At least from the command line, working with different profiles is kind of awkward, since you need to always provide an extra argument for non-default profiles, right?

You can use verdi profile setdefault to switch profiles. Typically I don't have to switch that often that it's tedious.

To me the git-like model makes things quite a bit easier to wrap my head around. Different projects are simply different directories. If you for some reason need different aiida installation for different projects, you could put Python venv (.venv) alongside .aiida folder.

It's interesting to see the different perspectives here. ^^ Due to AiiDA's heavy reliance on plugins, I can't imagine having the same Python environment for two different projects. If for some reason you need to have multiple datasets in the same AiiDA installation, you can make a new profile. I kind of get the feeling you want to use the new isolated AiiDA folder instances in lieu of AiiDA profiles, in a sense?

I suppose it then boils down to what you prefer. I like being able to have a Jupyter notebook anywhere and select the kernel to be able to interact with different AiiDA instances (old skool definition), and then load the profile whose dataset I'm interested in. Having to copy or move the notebook to interact with a different instance seems tedious.

Besides the .git analogy, I think it might be useful to think about this in terms of managing Python environments. Right now, by default aiida behaves like pip, having an ~/.aiida folder that's global to the user. What this PR proposes is more akin to the venv model.

Yes, I see, interesting point. Though I think most Python environment tools allow you to execute commands with that environment independently of which folder you are in. I suppose what I was loosely suggesting:

I wondered if somehow the configuration couldn't be automatically tied to the environment, e.g. by having a separate subfolder in the .aiida folder based on the path to the Python binary or something.

Is sort of similar to virtualenvwrapper or conda, that keep the environment files in subdirectories in one global directory. Alternatively, we can store the .aiida folder in the virtual environment directory by default. I'll think about making this all more concrete.

I'm still more in favor of not separating the AiiDA config from the AiiDA installation/Python environment though. Another argument is that the config format is actually dependent on the aiida-core version, and automatically migrated. I can already imagine I loaded a certain Python environment and accidentally run verdi status in the wrong folder, automatically migrating the config in the .aiida folder it finds and leading to much chagrin.

PS: we're actually meeting on this topic at 9:30 CEST today (so in less than 7 hours ^^). I'll send you the Zoom details in case you want to join.

Apr 12 '24 00:04 mbercx

There is no verdi delete command. Do you mean verdi profile delete? The whole point of verdi init is that you don't need any verdi commands to clean up, you just delete the .aiida folder with rm and that's it. Or do I misunderstand your point?

Yes yes I understand, but still, even if I want to verdi profile delete, it should behave as the same, no? Right now, the folder remains undeleted, then if I do verdi init again, I get the error that the configuration directory .aiida already exists. So then, one has to delete manually.

What I mean, is that I expected it behave the same both ways. Either deleting the folder manually or verdi profile delete init Would also delete configuration files of "init"

Apr 12 '24 07:04 khsrali

Yes yes I understand, but still, even if I want to verdi profile delete, it should behave as the same, no?

But that is the thing, verdi profile delete deletes a profile not the config directory. There never was a command to even think of deleting the config directory.

Apr 12 '24 07:04 sphuber

From @khsrali's comment above you can already see that this new approach is causing confusion.

The problem is that git init doesn't just create a new profile. It creates a new AiiDA "instance" (new definition, let's call it configuration from now on) that itself can have multiple profiles. AiiDA can't delete the configuration files just because one profile that is in that configuration is deleted.

Apr 12 '24 07:04 mbercx

I'm writing a small essay to continue from our discussion last Friday, but already wanted to note this before I forget:

$ verdi init                                                                                                      Sun Apr 14 09:49:28 2024
Success: Initialized new AiiDA instance in `/Users/mbercx/project/tmp/proj/.aiida`.
Report: Initialising the storage backend.
Report: Storage initialisation completed.
Success: Created new profile `init`.
Success: Configured the localhost as a computer.
$ verdi status                                                                                            845ms  Sun Apr 14 09:49:31 2024
 ✔ version:     AiiDA v2.5.1.post0
 ✔ config:      /Users/mbercx/project/tmp/proj/.aiida
 ✔ profile:     init
 ✔ storage:     SqliteDosStorage[/Users/mbercx/.aiida/repository/sqlite_dos_ca244afbb1164449907ffda242122f12]: open,
 ⏺ broker:      No broker defined for this profile: certain functionality not available.
 ⏺ daemon:      No broker defined for this profile: daemon is not available.

It seems the SqliteDosStorage is still using the .aiida folder in $HOME?

Apr 14 '24 07:04 mbercx

After our meeting on Friday, I took a bit of time to write down the discussion and supplement it with some further thoughts.

Buckle up and grab a fresh cup of ☕, because this is going to be a long one.

Semantics

I'll start with some semantics again, to limit confusion in the presentation and the discussion that follows. I will not define an AiiDA "instance", since this definition is exactly what is up for debate. For now, I will define[^1]:

[^1]: These are my definitions based on my current knowledge and understanding, and probably can be made more precise/complete. Suggestions welcome, we might even use it to start a glossary.

AiiDA installation: This is the source code, installed by a package manager so the verdi entry point is available from the command line.
The .aiida directory/folder: I've called this the "configuration" before, but it actually contains more than just configuration at this point:
- config.json: The actual configuration, both of the AiiDA installation and the AiiDA profiles (see below).
- daemon-related configuration files and logs for each profile.
- The default location for repositories.
- The default location for SQLite databases of the sqlite_dos storage.
AiiDA profile: each AiiDA profile is an isolated combination of an AiiDA storage and daemon.

The image below tries to visualise and connect these concepts:

Introduction

As @GeigerJ2 mentioned, the verdi init command (and commit https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe which it relies on) tries to tackle several issues:

Quick setup of a profile that requires no services.
Having a contained AiiDA "instance" that is easy to remove completely.
Better separation of AiiDA "instances".

(1) would be fully satisfied by "verdi blitz" (https://github.com/aiidateam/aiida-core/pull/6305), and (2) mostly as well. Doing verdi profile delete will remove the storage associated with the profile, but not the daemon-related files. This could be changed though, but let's not go into that here.

(3) is the main issue I want to discuss, i.e. the changes made in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe. I could also post my essay there, but I think verdi init and its usage advocated here helps to illustrate the consequences of the change, and why I'm against it.

The Status Quo

The current definition of an AiiDA "instance" is straight from the documentation:

An AiiDA instance is defined as the installed source code plus the configuration folder that stores the configuration files with all the configured profiles.

It continues:

It is possible to run multiple AiiDA instances on a single machine, simply by isolating the code and configuration in a virtual environment.

Note how it mentions to store the configuration in a virtual environment. Next it mentions using Python environments, which I'm sure we all agree with:

To isolate the code, make sure to install AiiDA into a virtual environment, e.g., with conda or venv, as described here. Whenever you activate this particular environment, you will be running the particular version of AiiDA (and all the plugins) that you installed specifically for it.

Then comes the tricky bit:

This is separate from the configuration of AiiDA, which is stored in the configuration directory which is always named .aiida and by default is stored in the home directory. Therefore, the default path of the configuration directory is ~/.aiida. By default, each AiiDA instance (each installation) will store associated profiles in this folder.

This is the current problem with separating AiiDA instances. By default, the .aiida folder is not isolated like the Python environment is. The documentation does indicate best practises on how to deal with this:

A best practice is to always separate the profiles together with the code to which they belong. The typical approach is to place the configuration folder in the virtual environment itself and have it automatically selected whenever the environment is activated.

But this requires setting the AIIDA_PATH environment variable, which is not trivial, environment-manager dependent and frankly: tedious.

`git`-like `.aiida` discovery

The changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe proposed a solution to this problem. It inserts an extra step in the resolution to the .aiida folder:

The AIIDA_PATH variable, if set.
git-like .aiida discovery
The $HOME/.aiida folder

Basically, if AIIDA_PATH is not set, AiiDA will now go up the hierarchy of the current working directory and look for the first instance of .aiida and use that.

At first glance the change seems innocuous and reasonable. I liked it too when I first saw the PR. But it has some pretty profound implications, especially when we combine it with verdi init and how it will be used.

As I've mentioned, it radically redefines what an AiiDA instance is:

An AiiDA instance is defined by its configuration directory, which is always named .aiida.

And allows for multiple .aiida directories per AiiDA installation[^2]. Next, its usage in verdi init encourages multiple .aiida directories:

[^2]: Note that in principle it is also possible to have multiple .aiida folders for one AiiDA installation in the status quo. The user could change the AIIDA_PATH variable if they wanted to. But it's a use case we clearly don't recommend, and so advanced that I'd be surprised if anyone has done it in practise.

any beginning user that installs AiiDA for the first time in a fresh Python environment will have two .aiida folders after running verdi init, and will interact with one of them depending on where they execute verdi or the location of the Python script they are running.
there are already multiple instances in the comments above where users would want to create another profile, but this would fail with verdi init because the .aiida folder already exists (in the current working directory or a parent) / AIIDA_PATH is set. It seems we'd be recommending them to create a new directory and run verdi init again.

So, after this is released, we can fully expect that users will have multiple .aiida directories per AiiDA installation, that each can have multiple AiiDA profiles.

The Problem

Great, so what's the problem? I've tried to gather my thoughts here as best as I could, and also added some more examples of problems a user could run into because of this change at the end.

It makes AiiDA more complex

The fact that we now can have multiple .aiida folders per install that each have multiple profiles adds another layer of complexity to the AiiDA setup, which is exactly what we want to avoid.

Let's say we adopt the new definition of an AiiDA instance as a .aiida directory. What's the difference between an AiiDA instance and profile? Why do we need both? I wouldn't be able to answer this question to be honest. In fact it seems like this new approach of separating AiiDA instances is usurping the concept of an AiiDA profile, and it's already clear from this thread that they are being used interchangeably (incorrectly so!). This is problematic for several reasons:

New users will probably quite quickly try using verdi init to create a second profile (e.g. if they realise they want/need RabbitMQ). This will fail or not, depending on the behaviour we choose here. If it does, they won't understand why, if it doesn't and create a new profile, what verdi init does will change fundamentally, and they'll have to understand the differences in how to interact with instances and profiles (see below).
Existing users will most likely be surprised by the behaviour of verdi init. They will assume that they've just created another profile they can access from anywhere with verdi, and be wondering "where is my profile" if they aren't in the correct folder. You could say that verdi init isn't for them, but then we don't solve the issue of isolating AiiDA instances.
All users will have to deal with both AiiDA "instances" and profiles, which have different modi operandi. Instances have git-like discovery, whereas profiles are configured in the .aiida folder and "loaded" through the Python API. Moreover, to get to the correct profile, you'll have to both (1) be in the correct directory and (2) still load the profile somehow. Users will rightfully ask why. Deleting a profile uses verdi profile delete, but for an instance you have to remove the folder.

To me it seems clear that having both multiple .aiida instances and profiles will cause quite a bit of confusion. One might argue that we can fully move to git-like discovery "instances" in lieu of profiles. But I see other issues with this:

It's a massive backwards-incompatible change in behaviour which would definitely require an AEP and probably a major release.
The AiiDA installation is not aware of the various .aiida instances associated with it, which is true for profiles.
Having a directory-based approach for interacting with certain instances is natural for CLI tools, but not so much for Python code. The location of the Python script/Jupyter notebook does not influence which Python instance I'm interacting with, rather it's the Python binary I'm calling or kernel I've loaded in my Jupyter notebook.
It's more challenging to interact with that instance because you need to be in a specific folder hierarchy. Like interacting with Python instances, the current approach allows you to execute /path/to/verdi -p profile to e.g. back up a certain profile in a cronjob. With git-like discovery you'll have to make sure any script command that interacts with a certain instance does so in that folder (or somehow tweaks the $PWD).

It doesn't fully deal with the issues it's trying to solve

Quick setup of a profile that requires no services.

As @superstar54 rightfully mentions, an advanced user might still want to be able to set up a profile quickly without having to deal with the complexity of verdi profile setup (I for one have rediscovered a new-found love for verdi quicksetup). However, anyone that actually uses AIIDA_PATH, can't use verdi init. So I guess we'd still need another command, which ropes back into @GeigerJ2's point that we shouldn't have too many commands for creating a profile.

Having a contained AiiDA "instance" that easy to remove completely.

The idea of having a fully contained AiiDA instance within a folder only works for the SQLite database. So if we would extend verdi init to Postgres-type databases the user can't simply remove the folder to purge an AiiDA "instance". And if we start promoting verdi init, you can be sure that users will expect instances/profiles to be fully contained in that folder.

Profiles on the other hand can be fully removed with verdi profile delete (or they should be).

Better separation of AiiDA "instances".

As mentioned above, anyone that doesn't use git init to separate AiiDA instances, still has to deal with the same problem. Now, you could argue they can just make a .aiida folder, or we provide some kind of command for that. But really, we are making things harder for the user than they have to be to solve something that AiiDA should take care of automatically.

Some more examples of potential issues

Separating the AiiDA installation from its configuration can have other downsides. The AiiDA installation determines the version of the configuration file. If AiiDA finds that the configuration file is outdated, it will automatically upgrade it (note that this doesn't work the other way!). This is desired behaviour, since we want configuration file format updates to be seamless for the user. However, with the changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe, if I run the verdi command of the wrong Python environment in the scope of another .aiida directory, it will either give an error or (worse) automatically migrate my configuration.
If you run a git command outside the directory scope of a .git directory, git will raise an error. This is not true for AiiDA. It's why Sebastiaan decided to not allow verdi init some/path as described here. This is problematic because it can be easy to forget you have to be in the correct folder scope (and Python environment) to interact with the AiiDA instance, and it won't be immediately clear that you are not.
A user downloads a Jupyter notebook and opens it. Normally they'd just select their Python kernel (VScode makes this quite easy) and then load the profile. If they are working with .aiida instances though, they have to move the notebook in the right folder first. And then they still have to load the kernel and profile.

Alternative solutions

Tie the AiiDA install to the Python instance

However, as @danielhollas mentioned, the status quo is not acceptable either. I'm currently working on AEP + PR that ties the .aiida directory to the corresponding Python instance. Effectively, each an "AiiDA instance" will once again be the combination of the AiiDA installation + .aiida directory, but the latter will be automatically isolated for each Python instance.

Folder scope for profiles

I do like the concept of git-like folder-based separation in general, and I understand some users might prefer it. What we could do is create a "git-folder-like" profile. Basically, instead of changing the default .aiida configuration folder, you'd change the default profile depending on where the code is executed.

Some questions that came up

Why does AIIDA_PATH allow for multiple semicolon-separated directories? What's the use case here?

TL; DR

While I understand the appeal of having git-like .aiida directory discovery, I think it's a sub-optimal solution to the problem of separating AiiDA instances.
The changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe should be reverted in favor of a solution that doesn't decouple the .aiida folder from the AiiDA installation.
In any case, any solution to the "separation of instances" problem is so impactful that it should require an AEP to properly motivate the chosen design.

Apr 14 '24 23:04 mbercx

Thanks for the very detailed writeup @mbercx . I have read through it and agree with most things and as I mentioned during our last discussion I am leaning to agreeing with you that we should probably revert https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe . The idea was nice, but with the current status quo it introduces too many situations where the behavior of AiiDA or the "best" approach to things become unclear.

That being said, here are some short comments I wanted to make:

Even though the AiiDA documentation might currently define an "instance" as the union of a Python environment + the .aiida directory, I am not sure that it should be. I agree that in most cases it would probably be useful to tie them together if multiple Python environments are available or even needed, but it should not be a requirement. For many simple use cases, having a single Python environment would be just fine and so there would be no need for isolating. This is a more advanced use case and for that we have a simple way of accomplishing that (I don't think requiring a user to know how to set an environment variable is too much too ask). I would strongly caution about adding logic to AiiDA to start handling the Python environments as this famously ridden with complicacies. There is no standard and as a result there are a million things of doing things depending on the OS and preferences of the user. For example, think even of a docker container where you wouldn't even need a virtual environment to install things, and I wouldn't AiiDA to start forcing users to use it.
"Why does AIIDA_PATH allow for multiple semicolon-separated directories? What's the use case here?" This was a design choice made @greschd but I don't remember why either. I also don't really see the point and it does make the behavior and implementation more complicated. If it weren't for backwards compatibility, I would have removed it and only accept a single path that actually includes the .aiida in the path.
Finally as to the real goal of verdi init: For me the main goal is to make it dead-easy to get a working Python environment for users that just want to try things out. This should require no special options, no prompts, no nothing. As soon as a user wants something more specific, it is perfectly reasonable to ask them to understand a bit more about what it is exactly what they want and to specifiy this through more specific commands with prompts/flags. I also want to guarantee that for power users there is an interface that gives full control and not have the interface limit anything. I think that having verdi init that only creates a core.sqlite_dos no-broker profile, combined with verdi profile setup for full control is perfectly fine. The only question for me is what to do with the special case of verdi quicksetup that provides custom logic just for the core.psql_dos backend.

@giovannipizzi @danielhollas the ball is now in your court I believe. The other people that were present during the discussion seem to be largely onboard with the analysis presented above and to remove the directory-based discovery of the config directory. Could you please let us know if you agree or if you have counter arguments that you think deserve to be discussed first. Ideally I would like to take a discussion soon because we would really like a verdi init-like command to easily create a no-services profile and release it with the upcoming v2.6.

Apr 15 '24 08:04 sphuber

Thanks to @giovannipizzi, @sphuber, @danielhollas, @GeigerJ2 and @khsrali for the discussion! I'm going to summarize some points here:

We all agree that the change in .aiida folder discovery behavior in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe is substantial enough that it requires more discussion (preferably through an AEP) and the changes should be reverted for now.
For the next release we want to still provide a command for new users to get up and running as quickly as possible. By default it should need no options, which will set up a new profile with the sqlite_dos backend and no RabbitMQ. This effectively means we will in the end go back to something like verdi blitz (https://github.com/aiidateam/aiida-core/pull/6305). The one decision we still have to agree on is the name of the command (init, presto, quickstart, light, ...), but we can discuss and vote on this in the PR. One important point is that the profile should only be set as the default if there isn't already a default, to match the behavior of verdi profile setup and avoid messing up running scripts that might rely on the default not changing (although best practise should be to explicitly set the profile in such scripts).
This does create the problem that we'd not have four commands to set up a profile. Next to the one in the previous point, we'd have:
- verdi profile setup: This command offers sub-commands for the various backends and gives full control. We keep this all cases.
- verdi setup: This command is effectively replaced by verdi profile setup, and should be deprecated and removed in the next major release.
- verdi quicksetup: This offers a much quicker way for users to set up a new profile, but only a psql_dos one with the RabbitMQ broker. Changing the behavior to set up the barebones one would be backwards incompatible. Still, the plan would be to deprecate it in lieu of:
To make easy profile setup still possible for all profile types, we can extend the verdi init command with optional flags as per @GeigerJ2's suggestion. This would require some discussion on what the flags should be, but that discussion can be had at a later date.

Action points:

@sphuber will work on reverting https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe and switching back to the verdi blitz option.
I will work on an AEP on my preferred definition of an AiiDA instance and how to separate them. (hopefully later this week, potentially the next due to scientific duties).
We should still consider alternative approaches to how the default profile is specified:
- An environment variable (e.g. AIIDA_PROFILE)
- git-like directory discovery, i.e. in case you are in a folder that contains a certain file in the hierarchy (e.g. aiida_profile.cfg.

Also thanks @sphuber for your comments here. I'll definitely consider/integrate the first in my AEP.

Apr 15 '24 13:04 mbercx

aiida-core aiida-core copied to clipboard

CLI: Add the `verdi init` command

Semantics

Introduction

The Status Quo

git-like .aiida discovery

The Problem

It makes AiiDA more complex

It doesn't fully deal with the issues it's trying to solve

Quick setup of a profile that requires no services.

Having a contained AiiDA "instance" that easy to remove completely.

Better separation of AiiDA "instances".

Some more examples of potential issues

Alternative solutions

Tie the AiiDA install to the Python instance

Folder scope for profiles

Some questions that came up

TL; DR

aiida-core
aiida-core copied to clipboard

`git`-like `.aiida` discovery