aiida-core
aiida-core copied to clipboard
CLI: Add the command `verdi presto`
This command aims making setting up a new profile as easy as possible. It intentionally provides a limited amount of options to customize the profile and by default does not require any options to be specified at all. For full control, the command verdi profile setup should be used instead.
The main goal for this command is to setup a profile with minimal requirements to make it easy to install AiiDA and get started as quickly as possible. Therefore, by default, the created profile uses the core.sqlite_dos storage plugin which does not require any services, such as PostgreSQL and RabbitMQ are not required. This does mean, however, that not all functionality of AiiDA is available, most notably running the daemon and submitting processes to said daemon.
The command performs the following actions:
- Create a new profile that is set as the new default
- Create a default user for the profile (can be configured through options)
- Set up the localhost as a
Computerand configure it - Set a number of configuration options with sensible defaults
In the future, it may be possible to incorporate the functionality of the verdi quicksetup command that automatically creates the role and database in PostgreSQL necessary for a profile with the core.psql_dos storage plugin. This would allow verdi quicksetup to be retired leaving just verdi presto and verdi profile setup to provide all the profile setup needs.
I have added a commit on top that now automatically tries to detect a RabbitMQ server on localhost. If it finds it, the profile is configured with that as broker, otherwise the profile will have no broker.
The second thing I could still add is to automatically detect a Postgres server, and if it exists automatically try to create a role and a database and use core.psql_dos instead of core.sqlite_dos.
I have added a commit on top that now automatically tries to detect a RabbitMQ server on localhost. If it finds it, the profile is configured with that as broker, otherwise the profile will have no broker.
This is great, and I think very reasonable. I think a user will quite quickly want to submit calculations after starting to use AiiDA.
One comment might be that at some point we'll probably have multiple brokers. But I think in the foreseeable future the RabbitMQ broker will be the preferred option if it is available.
The second thing I could still add is to automatically detect a Postgres server, and if it exists automatically try to create a role and a database and use core.psql_dos instead of core.sqlite_dos.
I would like this, but I'd say it's less necessary than the RabbitMQ broker. I'll give it some more thought while I'm dogfooding! 🐶
One comment might be that at some point we'll probably have multiple brokers. But I think in the foreseeable future the RabbitMQ broker will be the preferred option if it is available.
Honestly, I don't think this will be very likely. I think it will be a lot more complicated than having different storage plugins and I see little benefit. Up until now, I think pretty much all users only ever used the default configuration as well as technically there wasn't really an API to customize it (although they could have manually updated the config json). So I think this will go a long way and time for over 99% of the use cases.
I would like this, but I'd say it's less necessary than the RabbitMQ broker. I'll give it some more thought while I'm dogfooding!
I gathered as much from the meeting that people are still relying on this verdi quicksetup functionality quite a bit. Unfortunately it is not as straightforward as the case for RabbitMQ, because the latter comes with default credentials and the connection parameters are essentially identical on all OS'es and platforms. For PostgreSQL there is a whole host of different setups that change if and how new roles and database can be created. For example on Ubuntu, it is possible, but typically asks for a password. Given that verdi presto should require no options and should be promptless, I think it would be unacceptable if it would prompt for a password and potentially fail. So I guess in this case, at the very least, we should have a flag that enables the code to try and connect to postgresql, signalling that potentially prompting for a sudo password would be acceptable.
Honestly, I don't think this will be very likely. I think it will be a lot more complicated than having different storage plugins and I see little benefit.
I would be inclined to agree, if not for the well-known issue we have with RabbitMQ, where after setting up the profile using verdi presto they still have to deal with this:
verdi status 1113ms Wed Apr 17 21:29:43 2024
✔ version: AiiDA v2.5.1.post0
✔ config: /Users/mbercx/project/adis/.aiida
✔ profile: presto
✔ storage: SqliteDosStorage[/Users/mbercx/project/adis/.aiida/repository/sqlite_dos_1f5ae658a6324ad1ba511c1e212e9508]: open,
Warning: RabbitMQ v3.12.12 is not supported and will cause unexpected problems!
Warning: It can cause long-running workflows to crash and jobs to be submitted multiple times.
Warning: See https://github.com/aiidateam/aiida-core/wiki/RabbitMQ-version-to-use for details.
✔ broker: RabbitMQ v3.12.12 @ amqp://guest:[email protected]:5672?heartbeat=600
⏺ daemon: The daemon is not running.
But that's a fight for another day. ^^
Given that verdi presto should require no options and should be promptless, I think it would be unacceptable if it would prompt for a password and potentially fail. So I guess in this case, at the very least, we should have a flag that enables the code to try and connect to postgresql, signalling that potentially prompting for a sudo password would be acceptable.
I agree with this approach. verdi presto should be seamless, and not potentially face the same issues of verdi quicksetup. Extending the command to PostgreSQL with a flag seems like a perfectly fine solution.
Regarding the issue of adding database-specific options: I think we should avoid them. The command should try to figure out sensible defaults as much as possible, and only prompt the user in case it needs to, even with flags.
I would be inclined to agree, if not for the well-known issue we have with RabbitMQ, where after setting up the profile using verdi presto they still have to deal with this:
Fair, but this is for users who are already going through the trouble of installing RabbitMQ, so then I think it is fine to have this warning and point them to instructions to get rid of it. This won't affect the user that is just going to do pip install aiida-core && verdi presto to give things a quick spin.
Some initial dogfooding comments:
verdi status profile setup recommendations
As @khsrali mentioned, we should probably think of adapting the report here:
$ verdi status
✔ version: AiiDA v2.5.1.post0
✔ config: /Users/mbercx/project/adis/.aiida
⏺ profile: no profile configured yet
Report: Configure a profile by running `verdi quicksetup` or `verdi setup`.
I suppose we should point to both verdi presto and verdi profile setup, with a note that beginners should use verdi presto. Secondly, I would like to use setup here instead of configure. Though this is not fully fixed, I like to make the following distinction between setup and configure:
- setup: Something you only do once, and may contain settings that cannot be changed afterwards.
- configure: Setting you can always adapt afterwards. It's possible during setup you also configure some settings, but you can always reconfigure them later.
Remove user options
I'd be in favor of removing the user options here:
Options:
--profile-name TEXT The name of the profile. By default
generates a name that does not already exist
and starts with `presto`. [default:
(dynamic)]
--email TEXT The email of the default user. [default:
aiida@localhost]
--first-name TEXT The first name of the default user.
[default: John]
--last-name TEXT The last name of the default user.
[default: Doe]
--institution TEXT The institution of the default user.
[default: Unknown]
If a user actually wants to have their name etc attached to their nodes at some point, they can do so very easily with verdi user configure.
One note: do we have commands to migrate nodes to other users? That might be useful once the user decides they want to attach their details e.g. for an export.
setup: Something you only do once, and may contain settings that cannot be changed afterwards. configure: Setting you can always adapt afterwards. It's possible during setup you also configure some settings, but you can always reconfigure them later.
Makes sense to me, happy to adopt this terminology.
I'd be in favor of removing the user options here:
Fine as well. Do we keep the email though? We had a long discussion some time ago and since the email is not mutable, maybe we should keep that option for people making a production profile with verdi presto?
Fine as well. Do we keep the email though? We had a long discussion some time ago and since the email is not mutable, maybe we should keep that option for people making a production profile with verdi presto?
Hmm, right. Maybe it's easier to reconfigure your existing user than creating a new one and then updating the nodes somehow.
That said, some dogfooding of the verdi user commands has left me confused:
$ verdi user list 207ms Wed Apr 17 22:19:29 2024
Email First name Last name Institution
-- ---------------- ------------ ----------- -------------
aiida@localhost John Doe Unknown
* [email protected] Marnik Bercx EPFL
Report: The user highlighted and marked with a * is the default user.
$ verdi user configure [email protected] 818ms Wed Apr 17 22:19:33 2024
Report: enter ? for help.
Report: enter ! to ignore the default and set no value.
User email: test@new
First name []: Billy
Last name []: The Kid
Institution []: Wild West
Set as default? [y/N]: y
Usage: verdi user configure [OPTIONS]
Try 'verdi user configure --help' for help.
Error: Got unexpected extra argument ([email protected])
Note how it only complains about the extra argument at the end. I now understand that I have to do
$ verdi user configure --email [email protected]
But that doesn't seem that intuitive, especially since other CLI commands we have typically don't need to specify the identifier of the object you are operating on as an option. Beyond the scope of this PR though, but happy to write this down in an issue if you agree this should be changed[^1].
In the end, I'm not sure if we should add the --email. It's the one piece of information users would be least willing to share (More potential for SPAM? No thank you). One argument in favor of not adding the option now is that "we can always do so later", whereas this is not possible for removing it.
[^1]: In general, it would be good to rethink the concept of a user, and why a profile can have multiple users etc. Looking at this comment from a collaborator of @GeigerJ2, it does seem to cause some confusion as well. I've considered moving to a git-like model here, where the user can simply be fully configured at any time, and then attach their user when exporting. However, I realise that doesn't allow to keep track of nodes that were imported from somewhere else and then used by the exporting user.
But that doesn't seem that intuitive, especially since other CLI commands we have typically don't need to specify the identifier of the object you are operating on as an option. Beyond the scope of this PR though, but happy to write this down in an issue if you agree this should be changed
First of all, this is terribly off-topic. You put that thing back where it came from or I will be forced to whip out my Mr. ScopeCreep image again.
That being said, the confusion comes from the fact that the same command is being used to create new users and update existing ones. On top of that, I agree that one might expect the email as a positional argument. Finally, the error of the superfluous additional positional argument only being thrown at the end of all the prompts is a side effect of clicks parsing order. This is not unique to this command but will happen for all interactive commands. I even remember an ancient issue about this.
Edit: found it https://github.com/aiidateam/aiida-core/issues/1938
First of all, this is terribly off-topic. You put that thing back where it came from or I will be forced to whip out my Mr. ScopeCreep image again.
In the end, I'm not sure if we should add the --email. It's the one piece of information users would be least willing to share (More potential for SPAM? No thank you). One argument in favor of not adding the option now is that "we can always do so later", whereas this is not possible for removing it.
Back on topic, I would add no options for now, until we figure out the user issue some more. But I'm not so against it I would put my foot down. Happy either way.
In general, it would be good to rethink the concept of a user, and why a profile can have multiple users etc.
Just to provide some context as to why the concept of users exist. From the very beginning when the database schema was created, it was decided that each node (and other entities) should have an "owner". So the DbNode table had a foreign key of the DbUser table. This is why a profile should also always have a default user, because as soon as you instantiate and/or store a Node, it should automatically have its User populated. The idea of having a user associated with nodes was that when you shared/exported the data, it would be clear who the original creator of that data was. Clearly this is very valuable when sharing data. This now also explains why it is possible for a profile to have multiple users because as you import data from another user, that user has to be created in the database.
Thanks @sphuber, I remember a similar explanation when I was complaining about the User concept in the past. ^^
One thing that I was wondering about re-reading this: what happens if a user tries to import nodes from another User with the same email? I suppose the email serves like a sort of UUID? So this won't create a new user but simple associate the nodes to the already existing one?
I'm pretty sure I've run into this already, but with the new verdi presto command this will happen very often. So wanted to double-check the behavior.
I suppose the email serves like a sort of UUID? So this won't create a new user but simple associate the nodes to the already existing one?
Correct. The user table doesn't have a UUID but the email has to be unique. The assumption was made that since emails are typically unique identifiers for people in the real world, it would work for AiiDA as well.
More dogfooding! 🐶
Default profile names after the first
This one is a bit minor, but just a thought.
After making a first profile, I wondered what the name of a second profile generated with verdi presto would be:
$ verdi presto 207ms Thu Apr 18 07:14:34 2024
Report: RabbitMQ server detected: configuring the profile with a broker.
Report: Initialising the storage backend.
Report: Storage initialisation completed.
Success: Created new profile `presto-b351d3`.
Success: Configured the localhost as a computer.
I like that the hash is shorter (compared to the massive database names verdi quicksetup gives), but I'm just wondering if we can't just iterate number here. I.e. presto, presto-2, presto-3, ... I think this is easier to use and remember (even with that sexy tab-completion).
Default localhost workdir
Having verdi presto automatically set up a localhost computer solves that old issue https://github.com/aiidateam/aiida-core/issues/749, which is great.
However, as I also commented there, I was wondering what the default workdir should be. Looking at the computer now:
verdi computer show localhost 819ms Thu Apr 18 07:15:33 2024
--------------------------- --------------------------------------------------------------
Label localhost
PK 1
UUID db0233db-fd97-4a62-9ce0-ac792a2390c7
Description Localhost automatically created by `verdi presto`
Hostname localhost
Transport type core.local
Scheduler type core.direct
Work directory /var/folders/34/d38ymzfd6t5dsrv7d4wcww200000gn/T/aiida_scratch
Shebang #!/bin/bash
Mpirun command mpirun -np {tot_num_mpiprocs}
Default #procs/machine 1
Default memory (kB)/machine
Prepend text
Append text
--------------------------- --------------------------------------------------------------
The default workdir is /var/folders/34/d38ymzfd6t5dsrv7d4wcww200000gn/T/aiida_scratch. What was the motivation for this choice? Are there alternatives? E.g. why not in the .aiida folder?
At the risk of turning into Mr. Scope Creep again, but this would be less of an issue if https://github.com/aiidateam/team-compass/issues/12 could be solved. In line with the definitions above, the workdir should be part of the configuration.
but I'm just wondering if we can't just iterate number here. I.e. presto, presto-2, presto-3
That would work for me as well.
What was the motivation for this choice?
Since this is the scratch directory essentially, I thought it made sense to have it in a temporary directory that makes it clear that it is scratch. The directory is determined by import tempfile; tempfile.gettempdir() which returns the tmp file system on the OS. On Linux this is typically /tmp.
In line with the definitions above, the workdir should be part of the configuration.
I think this makes more sense indeed. But, besides breaking backwards-compatibility, we would have to check carefully whether changing the working directory on a computer that is actively being used could have unexpected effects. Some code may silently assume that this is immutable.
To give a summary of points made by @mbercx that are directly pertinent to this PR and not wildly scope-creepey:
- Change output of
verdi statusto refer toverdi prestoandverdi profile setup - Change default profile naming scheme to use incrementing integer suffix instead of random hash
- Remove all user configuration options
I am onboard with all suggestions, except I would keep the --email option since that cannot be changed in the current API and would have consequences. (Still that would force a user to be aware of this when calling verdi presto but I am thinking mostly of users that are not first time users and simply use it to easily setup a core.psql_dos profile for production, so it is fine to ask them to specify an --email option)
Since this is the scratch directory essentially, I thought it made sense to have it in a temporary directory that makes it clear that it is scratch.
How temporary is a temporary directory? I'm worried that users might expect to be able to use e.g. verdi calcjob gotocomputer for some time after running their calculation.
@sphuber @mbercx feel free to tag me for review once you converge, happy to take a look and do a bit of dogfooding.
am onboard with all suggestions, except I would keep the --email option since that cannot be changed in the current API and would have consequences. (Still that would force a user to be aware of this when calling verdi presto but I am thinking mostly of users that are not first time users and simply use it to easily setup a core.psql_dos profile for production, so it is fine to ask them to specify an --email option)
Yes please!
I have addressed the final action points of this comment https://github.com/aiidateam/aiida-core/pull/6351#issuecomment-2063203334
This is now ready for final review. Pinging @danielhollas
@sphuber thanks for indulging me, and improving the tests! I left a couple of follow up comments.
I have also tested this locally and it works great! 🎉
thanks for indulging me, and improving the tests!
My pleasure, intensive reviews always improve the end result. Your time is much appreciated!
Hey guys, sry, a bit late to the party here. Just went through the discussion and the code changes, and don't really have anything to add after the excellent comments by @mbercx and @danielhollas. Seems in very good shape to me!
Though, related to the most important open question: in a jiffy actually made me giggle when I was dogfooding now :D so I like it, but would ofc also be fine with the other suggestions.
Though, related to the most important open question: in a jiffy actually made me giggle when I was dogfooding now :D so I like it, but would ofc also be fine with the other suggestions.
Ironically, this bike shed is taking ages to paint. :rofl:
Ok, I think we have bikeshedded enough for now. I am merging this. If there are still important changes, they can be made before the release.