aiida-core CLI: Add the command `verdi presto`

trafficstars

This command aims making setting up a new profile as easy as possible. It intentionally provides a limited amount of options to customize the profile and by default does not require any options to be specified at all. For full control, the command verdi profile setup should be used instead.

The main goal for this command is to setup a profile with minimal requirements to make it easy to install AiiDA and get started as quickly as possible. Therefore, by default, the created profile uses the core.sqlite_dos storage plugin which does not require any services, such as PostgreSQL and RabbitMQ are not required. This does mean, however, that not all functionality of AiiDA is available, most notably running the daemon and submitting processes to said daemon.

The command performs the following actions:

Create a new profile that is set as the new default
Create a default user for the profile (can be configured through options)
Set up the localhost as a Computer and configure it
Set a number of configuration options with sensible defaults

In the future, it may be possible to incorporate the functionality of the verdi quicksetup command that automatically creates the role and database in PostgreSQL necessary for a profile with the core.psql_dos storage plugin. This would allow verdi quicksetup to be retired leaving just verdi presto and verdi profile setup to provide all the profile setup needs.

Apr 17 '24 07:04 sphuber

I have added a commit on top that now automatically tries to detect a RabbitMQ server on localhost. If it finds it, the profile is configured with that as broker, otherwise the profile will have no broker.

The second thing I could still add is to automatically detect a Postgres server, and if it exists automatically try to create a role and a database and use core.psql_dos instead of core.sqlite_dos.

Apr 17 '24 16:04 sphuber

I have added a commit on top that now automatically tries to detect a RabbitMQ server on localhost. If it finds it, the profile is configured with that as broker, otherwise the profile will have no broker.

This is great, and I think very reasonable. I think a user will quite quickly want to submit calculations after starting to use AiiDA.

One comment might be that at some point we'll probably have multiple brokers. But I think in the foreseeable future the RabbitMQ broker will be the preferred option if it is available.

The second thing I could still add is to automatically detect a Postgres server, and if it exists automatically try to create a role and a database and use core.psql_dos instead of core.sqlite_dos.

I would like this, but I'd say it's less necessary than the RabbitMQ broker. I'll give it some more thought while I'm dogfooding! 🐶

Apr 17 '24 18:04 mbercx

One comment might be that at some point we'll probably have multiple brokers. But I think in the foreseeable future the RabbitMQ broker will be the preferred option if it is available.

Honestly, I don't think this will be very likely. I think it will be a lot more complicated than having different storage plugins and I see little benefit. Up until now, I think pretty much all users only ever used the default configuration as well as technically there wasn't really an API to customize it (although they could have manually updated the config json). So I think this will go a long way and time for over 99% of the use cases.

I would like this, but I'd say it's less necessary than the RabbitMQ broker. I'll give it some more thought while I'm dogfooding!

I gathered as much from the meeting that people are still relying on this verdi quicksetup functionality quite a bit. Unfortunately it is not as straightforward as the case for RabbitMQ, because the latter comes with default credentials and the connection parameters are essentially identical on all OS'es and platforms. For PostgreSQL there is a whole host of different setups that change if and how new roles and database can be created. For example on Ubuntu, it is possible, but typically asks for a password. Given that verdi presto should require no options and should be promptless, I think it would be unacceptable if it would prompt for a password and potentially fail. So I guess in this case, at the very least, we should have a flag that enables the code to try and connect to postgresql, signalling that potentially prompting for a sudo password would be acceptable.

Apr 17 '24 18:04 sphuber

Honestly, I don't think this will be very likely. I think it will be a lot more complicated than having different storage plugins and I see little benefit.

I would be inclined to agree, if not for the well-known issue we have with RabbitMQ, where after setting up the profile using verdi presto they still have to deal with this:

verdi status                                                                                                                                                                           1113ms  Wed Apr 17 21:29:43 2024
 ✔ version:     AiiDA v2.5.1.post0
 ✔ config:      /Users/mbercx/project/adis/.aiida
 ✔ profile:     presto
 ✔ storage:     SqliteDosStorage[/Users/mbercx/project/adis/.aiida/repository/sqlite_dos_1f5ae658a6324ad1ba511c1e212e9508]: open,
Warning: RabbitMQ v3.12.12 is not supported and will cause unexpected problems!
Warning: It can cause long-running workflows to crash and jobs to be submitted multiple times.
Warning: See https://github.com/aiidateam/aiida-core/wiki/RabbitMQ-version-to-use for details.
 ✔ broker:      RabbitMQ v3.12.12 @ amqp://guest:[email protected]:5672?heartbeat=600
 ⏺ daemon:      The daemon is not running.

But that's a fight for another day. ^^

Given that verdi presto should require no options and should be promptless, I think it would be unacceptable if it would prompt for a password and potentially fail. So I guess in this case, at the very least, we should have a flag that enables the code to try and connect to postgresql, signalling that potentially prompting for a sudo password would be acceptable.

I agree with this approach. verdi presto should be seamless, and not potentially face the same issues of verdi quicksetup. Extending the command to PostgreSQL with a flag seems like a perfectly fine solution.

Regarding the issue of adding database-specific options: I think we should avoid them. The command should try to figure out sensible defaults as much as possible, and only prompt the user in case it needs to, even with flags.

Apr 17 '24 19:04 mbercx

I would be inclined to agree, if not for the well-known issue we have with RabbitMQ, where after setting up the profile using verdi presto they still have to deal with this:

Fair, but this is for users who are already going through the trouble of installing RabbitMQ, so then I think it is fine to have this warning and point them to instructions to get rid of it. This won't affect the user that is just going to do pip install aiida-core && verdi presto to give things a quick spin.

Apr 17 '24 19:04 sphuber

Some initial dogfooding comments:

`verdi status` profile setup recommendations

As @khsrali mentioned, we should probably think of adapting the report here:

$ verdi status
 ✔ version:     AiiDA v2.5.1.post0
 ✔ config:      /Users/mbercx/project/adis/.aiida
 ⏺ profile:     no profile configured yet
Report: Configure a profile by running `verdi quicksetup` or `verdi setup`.

I suppose we should point to both verdi presto and verdi profile setup, with a note that beginners should use verdi presto. Secondly, I would like to use setup here instead of configure. Though this is not fully fixed, I like to make the following distinction between setup and configure:

setup: Something you only do once, and may contain settings that cannot be changed afterwards.
configure: Setting you can always adapt afterwards. It's possible during setup you also configure some settings, but you can always reconfigure them later.

Remove user options

I'd be in favor of removing the user options here:

Options:
  --profile-name TEXT             The name of the profile. By default
                                  generates a name that does not already exist
                                  and starts with `presto`.  [default:
                                  (dynamic)]
  --email TEXT                    The email of the default user.  [default:
                                  aiida@localhost]
  --first-name TEXT               The first name of the default user.
                                  [default: John]
  --last-name TEXT                The last name of the default user.
                                  [default: Doe]
  --institution TEXT              The institution of the default user.
                                  [default: Unknown]

If a user actually wants to have their name etc attached to their nodes at some point, they can do so very easily with verdi user configure.

One note: do we have commands to migrate nodes to other users? That might be useful once the user decides they want to attach their details e.g. for an export.

Apr 17 '24 19:04 mbercx

setup: Something you only do once, and may contain settings that cannot be changed afterwards. configure: Setting you can always adapt afterwards. It's possible during setup you also configure some settings, but you can always reconfigure them later.

Makes sense to me, happy to adopt this terminology.

I'd be in favor of removing the user options here:

Fine as well. Do we keep the email though? We had a long discussion some time ago and since the email is not mutable, maybe we should keep that option for people making a production profile with verdi presto?

Apr 17 '24 19:04 sphuber

Fine as well. Do we keep the email though? We had a long discussion some time ago and since the email is not mutable, maybe we should keep that option for people making a production profile with verdi presto?

Hmm, right. Maybe it's easier to reconfigure your existing user than creating a new one and then updating the nodes somehow.

That said, some dogfooding of the verdi user commands has left me confused:

$ verdi user list                                                                                                                                                                     207ms  Wed Apr 17 22:19:29 2024
    Email             First name    Last name    Institution
--  ----------------  ------------  -----------  -------------
    aiida@localhost   John          Doe          Unknown
*   [email protected]  Marnik        Bercx        EPFL

Report: The user highlighted and marked with a * is the default user.

$ verdi user configure [email protected]                                                                                                                                                   818ms  Wed Apr 17 22:19:33 2024
Report: enter ? for help.
Report: enter ! to ignore the default and set no value.
User email: test@new
First name []: Billy
Last name []: The Kid
Institution []: Wild West
Set as default? [y/N]: y
Usage: verdi user configure [OPTIONS]
Try 'verdi user configure --help' for help.

Error: Got unexpected extra argument ([email protected])

Note how it only complains about the extra argument at the end. I now understand that I have to do

$ verdi user configure --email [email protected]

But that doesn't seem that intuitive, especially since other CLI commands we have typically don't need to specify the identifier of the object you are operating on as an option. Beyond the scope of this PR though, but happy to write this down in an issue if you agree this should be changed[^1].

In the end, I'm not sure if we should add the --email. It's the one piece of information users would be least willing to share (More potential for SPAM? No thank you). One argument in favor of not adding the option now is that "we can always do so later", whereas this is not possible for removing it.

[^1]: In general, it would be good to rethink the concept of a user, and why a profile can have multiple users etc. Looking at this comment from a collaborator of @GeigerJ2, it does seem to cause some confusion as well. I've considered moving to a git-like model here, where the user can simply be fully configured at any time, and then attach their user when exporting. However, I realise that doesn't allow to keep track of nodes that were imported from somewhere else and then used by the exporting user.

Apr 17 '24 20:04 mbercx

But that doesn't seem that intuitive, especially since other CLI commands we have typically don't need to specify the identifier of the object you are operating on as an option. Beyond the scope of this PR though, but happy to write this down in an issue if you agree this should be changed

First of all, this is terribly off-topic. You put that thing back where it came from or I will be forced to whip out my Mr. ScopeCreep image again.

That being said, the confusion comes from the fact that the same command is being used to create new users and update existing ones. On top of that, I agree that one might expect the email as a positional argument. Finally, the error of the superfluous additional positional argument only being thrown at the end of all the prompts is a side effect of clicks parsing order. This is not unique to this command but will happen for all interactive commands. I even remember an ancient issue about this.

Edit: found it https://github.com/aiidateam/aiida-core/issues/1938

Apr 17 '24 20:04 sphuber

First of all, this is terribly off-topic. You put that thing back where it came from or I will be forced to whip out my Mr. ScopeCreep image again.

baby-retreat

Apr 17 '24 20:04 mbercx

In the end, I'm not sure if we should add the --email. It's the one piece of information users would be least willing to share (More potential for SPAM? No thank you). One argument in favor of not adding the option now is that "we can always do so later", whereas this is not possible for removing it.

Back on topic, I would add no options for now, until we figure out the user issue some more. But I'm not so against it I would put my foot down. Happy either way.

Apr 17 '24 20:04 mbercx

In general, it would be good to rethink the concept of a user, and why a profile can have multiple users etc.

Just to provide some context as to why the concept of users exist. From the very beginning when the database schema was created, it was decided that each node (and other entities) should have an "owner". So the DbNode table had a foreign key of the DbUser table. This is why a profile should also always have a default user, because as soon as you instantiate and/or store a Node, it should automatically have its User populated. The idea of having a user associated with nodes was that when you shared/exported the data, it would be clear who the original creator of that data was. Clearly this is very valuable when sharing data. This now also explains why it is possible for a profile to have multiple users because as you import data from another user, that user has to be created in the database.

Apr 17 '24 21:04 sphuber

Thanks @sphuber, I remember a similar explanation when I was complaining about the User concept in the past. ^^

One thing that I was wondering about re-reading this: what happens if a user tries to import nodes from another User with the same email? I suppose the email serves like a sort of UUID? So this won't create a new user but simple associate the nodes to the already existing one?

I'm pretty sure I've run into this already, but with the new verdi presto command this will happen very often. So wanted to double-check the behavior.

Apr 18 '24 04:04 mbercx

I suppose the email serves like a sort of UUID? So this won't create a new user but simple associate the nodes to the already existing one?

Correct. The user table doesn't have a UUID but the email has to be unique. The assumption was made that since emails are typically unique identifiers for people in the real world, it would work for AiiDA as well.

Apr 18 '24 04:04 sphuber

More dogfooding! 🐶

Default profile names after the first

This one is a bit minor, but just a thought.

After making a first profile, I wondered what the name of a second profile generated with verdi presto would be:

$ verdi presto                                                                                               207ms  Thu Apr 18 07:14:34 2024
Report: RabbitMQ server detected: configuring the profile with a broker.
Report: Initialising the storage backend.
Report: Storage initialisation completed.
Success: Created new profile `presto-b351d3`.
Success: Configured the localhost as a computer.

I like that the hash is shorter (compared to the massive database names verdi quicksetup gives), but I'm just wondering if we can't just iterate number here. I.e. presto, presto-2, presto-3, ... I think this is easier to use and remember (even with that sexy tab-completion).

Default `localhost` workdir

Having verdi presto automatically set up a localhost computer solves that old issue https://github.com/aiidateam/aiida-core/issues/749, which is great.

However, as I also commented there, I was wondering what the default workdir should be. Looking at the computer now:

verdi computer show localhost                                                                              819ms  Thu Apr 18 07:15:33 2024
---------------------------  --------------------------------------------------------------
Label                        localhost
PK                           1
UUID                         db0233db-fd97-4a62-9ce0-ac792a2390c7
Description                  Localhost automatically created by `verdi presto`
Hostname                     localhost
Transport type               core.local
Scheduler type               core.direct
Work directory               /var/folders/34/d38ymzfd6t5dsrv7d4wcww200000gn/T/aiida_scratch
Shebang                      #!/bin/bash
Mpirun command               mpirun -np {tot_num_mpiprocs}
Default #procs/machine       1
Default memory (kB)/machine
Prepend text
Append text
---------------------------  --------------------------------------------------------------

The default workdir is /var/folders/34/d38ymzfd6t5dsrv7d4wcww200000gn/T/aiida_scratch. What was the motivation for this choice? Are there alternatives? E.g. why not in the .aiida folder?

At the risk of turning into Mr. Scope Creep again, but this would be less of an issue if https://github.com/aiidateam/team-compass/issues/12 could be solved. In line with the definitions above, the workdir should be part of the configuration.

Apr 18 '24 05:04 mbercx

but I'm just wondering if we can't just iterate number here. I.e. presto, presto-2, presto-3

That would work for me as well.

What was the motivation for this choice?

Since this is the scratch directory essentially, I thought it made sense to have it in a temporary directory that makes it clear that it is scratch. The directory is determined by import tempfile; tempfile.gettempdir() which returns the tmp file system on the OS. On Linux this is typically /tmp.

In line with the definitions above, the workdir should be part of the configuration.

I think this makes more sense indeed. But, besides breaking backwards-compatibility, we would have to check carefully whether changing the working directory on a computer that is actively being used could have unexpected effects. Some code may silently assume that this is immutable.

Apr 18 '24 07:04 sphuber

To give a summary of points made by @mbercx that are directly pertinent to this PR and not wildly scope-creepey:

Change output of verdi status to refer to verdi presto and verdi profile setup
Change default profile naming scheme to use incrementing integer suffix instead of random hash
Remove all user configuration options

I am onboard with all suggestions, except I would keep the --email option since that cannot be changed in the current API and would have consequences. (Still that would force a user to be aware of this when calling verdi presto but I am thinking mostly of users that are not first time users and simply use it to easily setup a core.psql_dos profile for production, so it is fine to ask them to specify an --email option)

Apr 18 '24 07:04 sphuber

Since this is the scratch directory essentially, I thought it made sense to have it in a temporary directory that makes it clear that it is scratch.

How temporary is a temporary directory? I'm worried that users might expect to be able to use e.g. verdi calcjob gotocomputer for some time after running their calculation.

Apr 18 '24 07:04 mbercx

@sphuber @mbercx feel free to tag me for review once you converge, happy to take a look and do a bit of dogfooding.

am onboard with all suggestions, except I would keep the --email option since that cannot be changed in the current API and would have consequences. (Still that would force a user to be aware of this when calling verdi presto but I am thinking mostly of users that are not first time users and simply use it to easily setup a core.psql_dos profile for production, so it is fine to ask them to specify an --email option)

Yes please!

Apr 18 '24 09:04 danielhollas

I have addressed the final action points of this comment https://github.com/aiidateam/aiida-core/pull/6351#issuecomment-2063203334

This is now ready for final review. Pinging @danielhollas

Apr 26 '24 15:04 sphuber

@sphuber thanks for indulging me, and improving the tests! I left a couple of follow up comments.

I have also tested this locally and it works great! 🎉

Apr 28 '24 15:04 danielhollas

thanks for indulging me, and improving the tests!

My pleasure, intensive reviews always improve the end result. Your time is much appreciated!

Apr 29 '24 14:04 sphuber

Hey guys, sry, a bit late to the party here. Just went through the discussion and the code changes, and don't really have anything to add after the excellent comments by @mbercx and @danielhollas. Seems in very good shape to me! Though, related to the most important open question: in a jiffy actually made me giggle when I was dogfooding now :D so I like it, but would ofc also be fine with the other suggestions.

May 02 '24 11:05 GeigerJ2

Though, related to the most important open question: in a jiffy actually made me giggle when I was dogfooding now :D so I like it, but would ofc also be fine with the other suggestions.

Ironically, this bike shed is taking ages to paint. :rofl:

May 02 '24 12:05 danielhollas

Ok, I think we have bikeshedded enough for now. I am merging this. If there are still important changes, they can be made before the release.

May 03 '24 21:05 sphuber

aiida-core aiida-core copied to clipboard

CLI: Add the command `verdi presto`

verdi status profile setup recommendations

Remove user options

Default profile names after the first

Default localhost workdir

aiida-core
aiida-core copied to clipboard

`verdi status` profile setup recommendations

Default `localhost` workdir