quacc
quacc copied to clipboard
Implement MD workflows
Summary of Changes
Closes #1577.
Before going further a little consultation:
MD users use workflows all the time, just for the NVE example in this commit there is pretty much a thousand ways you can run that; doing a first NVT to relax to the desired temperature, or a first NPT to relax to the desired density, etc...
- For example, should we have one NVT workflow with a parameter "thermostat", or one workflow per thermostat?
- If you want things to be in recipe.common, atoms will have to be sent in with a calculator already attached, is that what you want?
- From what I understand you want to keep things simple (not something like the code here), users then build their workflows to get what they want.
- In any case, for now doing these kinds of workflows is difficult, because it is generally hard to transfer results between ASE calculators, which will always happen between runs, this is critical since MD needs momenta and forces to ensure proper continuation. (This is a problem I would like fixed above since it also causes problems when restarting optimization etc)
Checklist
- [ ] I have read the "Guidelines" section of the contributing guide. Don't lie! 😉
- [ ] My PR is on a custom branch and is not named
main
. - [ ] I have used
black
,isort
, andruff
as described in the style guide. - [ ] I have added relevant, comprehensive unit tests.
Can one of the admins verify this patch?
@tomdemeyere: Happy to respond to each question but before I do, I just wanted to check that this is the full commit you wanted to share --- just the args/kwargs, right?
(In short, I really would love to have MD flows and am willing to work together to figure out how to get this done the best way possible)
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.08%. Comparing base (
f6bc96c
) to head (a9c5ec2
). Report is 11 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1672 +/- ##
==========================================
+ Coverage 99.04% 99.08% +0.04%
==========================================
Files 82 84 +2
Lines 3448 3515 +67
==========================================
+ Hits 3415 3483 +68
+ Misses 33 32 -1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@tomdemeyere: Happy to respond to each question but before I do, I just wanted to check that this is the full commit you wanted to share --- just the args/kwargs, right?
(In short, I really would love to have MD flows and am willing to work together to figure out how to get this done the best way possible)
Yes, before doing more I wanted to make sure we are on the same page.
I guess my main question here is: how to make these flows common to all codes? Should they call code specific MD jobs? Or, should a calculator be attached to the Atoms object being sent to the MD flow?
Once I know this we can work on details
I guess my main question here is: how to make these flows common to all codes? Should they call code specific MD jobs? Or, should a calculator be attached to the Atoms object being sent to the MD flow?
@tomdemeyere: Good question.
Personally, I think what would likely make the most sense is to make a run_md
function in quacc.runners.ase
that would take an Atoms
object with an attached calculator just like the other methods in there. You likely were aware of that already.
From there, we get to the recipes, and my answer is: "it depends." If it is just a simple @job
(i.e. not a @flow
), then this is no different than a relaxation job --- one would just be calling run_md
instead of run_calc
or run_ase_opt
in a given code's module (e.g. quacc.recipes.emt
). Since there wouldn't be much logic beyond defining default arguments and calling run_md
, this is fine and doesn't introduce much duplication. I would treat the individual MD jobs in the same way as a relaxation --- we have a runner, which is called in the various recipes where it is appropriate. It does mean that there would be multiple MD modules, but we would need that anyway because every code will have different parameters (defaults) that are needed to properly run an MD calculation.
If we are talking about a @flow
where there are multiple Job
s being stitched together, and we anticipate this pattern (i.e. the workflow DAG) to be calculator-agnostic, then we'll put that in common
. This is the philosophy behind the existing common
recipes. But we would still need individual Job
s for each code. In general, users aren't expected to call the common
workflow directly. Aside from usability aspects, there are some other nuances about (de)serialization needed by workflow engines for why I say this, but I won't bore you with those details here.
To get started, I would suggest making quacc.runners.ase.run_md
and then a simple demo for a cheap-to-run calculator, like EMT or LJ. That will likely be illustrative and help guide the design further.
I should also note that we can always refactor later if there is something we can refactor. No need to prematurely optimize.
For example, should we have one NVT workflow with a parameter "thermostat", or one workflow per thermostat?
If the workflow logic is staying the same regardless of thermostat, then I would suggest one job/workflow where thermostat would be a keyword argument with a sensible default.
If you want things to be in recipe.common, atoms will have to be sent in with a calculator already attached, is that what you want?
See comment above. If we are talking about a @job
, then that would call run_md
and be specific for the code since we need to set some sensible default parameters. If it's a @flow
pattern, that can go in common
but would not require passing in Atoms
objects with attached calculators --- it would require passing in one or more Job
s.
From what I understand you want to keep things simple (not something like the code here), users then build their workflows to get what they want.
Hard for me to tell from the commit you shared here, but I am not opposed to complexity. In fact, we should not rely on the users to build something complex. If the @flow
pattern is "trivial", like a relax then an MD run, it is debatable if we should make that its own @flow
or not (honestly, it can go either way). But if it's more complex, for sure we should.
In any case, for now doing these kinds of workflows is difficult, because it is generally hard to transfer results between ASE calculators, which will always happen between runs, this is critical since MD needs momenta and forces to ensure proper continuation. (This is a problem I would like fixed above since it also causes problems when restarting optimization etc)
This is a good point. However, I think it is solvable. Let's circle back to this when the time is right.
@Andrew-S-Rosen Thanks for all these details. I will come back to it at some point
@Andrew-S-Rosen I think this is now a good point to have your input
Thank you for kicking this off, @tomdemeyere! Really appreciate the contribution!
This is looking good. I have some comments that are mostly related to some minor restructuring.
Starting with this isolated EMT example is very helpful for me in understanding the process and figuring out what should be refactored, so I thank you for doing that.
I see that your comment is different than the one I received by email from git. But yes, originally my plan was to have these "_base" functions. Either per calculator or in common like you previously said.
After discussing that along with other details I was then planning to introduce recipes for the other ensembles (NVT, NPT).
Didn't see your comments originally (github mobile) will work on it at some point
I see that your comment is different than the one I received by email from git. But yes, originally my plan was to have these "_base" functions. Either per calculator or in common like you previously said.
Apologies about the edits! I was trying to sort out where to place them and was going back-and-forth so decided to just save it for a future discussion to avoid confusion. But you got the right idea anyway.
After discussing that along with other details I was then planning to introduce recipes for the other ensembles (NVT, NPT).
That would be great!
Didn't see your comments originally (github mobile) will work on it at some point
Thank you!
Few comments:
- Without run_kwargs there is no way to pass things to
dyn.run()
- There is still a check_convergence, although it does not make much sense for MD. By definition you reached the max number of steps, but there is a case where it makes sense:
We are currently working on a error-free
quacc as part of another project. We make sure that quacc always reaches the point of writing to the database: it writes the errors that occurred, along with the available results, if any (the errors might happen at some point during the MD, previous results should be returned, actually in some cases, errors might even be part of your journey (although this might not be what Quacc was originally made for)). Currently if one is running a bunch of simulations, if errors occur, there will be a bunch of directories with no easy way to figure things out.
Because you said you might be interested in error handling I am leaving that there.
- I changed how temperature/time etc... is written to results, feel free to tell me what you think
Without run_kwargs there is no way to pass things to dyn.run()
The fmax
and steps
are already passed in quacc.runners.ase.run_opt
, and .run()
doesn't take any other keyword arguments, so we don't need it.
There is still a check_convergence, although it does not make much sense for MD. By definition you reached the max number of steps, but there is a case where it makes sense:
I agree that a check_convergence
doesn't make much sense for this job type. We should remove that logic entirely.
We are currently working on a error-free quacc as part of another project. We make sure that quacc always reaches the point of writing to the database: it writes the errors that occurred, along with the available results,
One could catch the typical RuntimeError
errors when ASE is called and try to give the user some flexibility via the global settings about how to handle things with the CHECK_CONVERGENCE
and/or some other global parameter. But of course, feel free to continue doing what you're doing!
Currently if one is running a bunch of simulations, if errors occur, there will be a bunch of directories with no easy way to figure things out.
Indeed. This is a limitation of Parsl/Dask, which is really best thought of a task manager. Other true workflow tools like Prefect or Covalent or FireWorks have a separate database for the task metadata and would highlight when/where the job fails. But since Parsl does not have a task database of its own, there is no mechanism to easily store that info. The recommended approach would be to launch your calculations in a for
loop and log any errors from the Parsl AppFuture
. But of course, your mechanism is fine as well.
I changed how temperature/time etc... is written to results, feel free to tell me what you think
I like that a lot more!
@tomdemeyere Thank you for addressing my comments! This looks good to me. How are you feeling about the state of this PR?
I will likely go ahead and refactor summarize_md_run
and run_md
afterwards since there is some copy/paste between methods that we can likely reduce. But that's something I'm happy to sort out.
I just changed the whole microcanonical thing to "md_job" because there will not be any major differences between these jobs except from Quacc point of view.
ASE has a lot of deprecated keywords, such as "temperature" or "pressure" which should be written as "temperature_K" or "pressure_au". Except... for some MD classes where they are perfectly fine 👌. Anyway, what I do here is: I change everything to what is normally expected, In Quacc if the user input "temperature" or "temperature_K" they will both work in Kelvin (A message will be printed in the first case). Feel free to change this behaviour if you want.
-
Momenta (results in general) has to be carried from job to job, this is a hard requirement, without this, flows do not make sense for MD jobs. For now, when setting a new calculator in the next job previous results will be erased.
-
fixcm and fixrot have been changed to fix_com and fix_rot, which I prefer as well, but fixcm and fixrot from ASE are still a thing in some ASE MD classes and might confuse people.
Note to self: there are some interesting ideas in https://github.com/materialsproject/atomate2/pull/722/files#diff-ae55fbbe68bef6a3e4bd052ea86c473adb44a91b440d2abb63a89ff1be198fe7 by Materials Project colleagues.
Note to self: there are some interesting ideas in https://github.com/materialsproject/atomate2/pull/722/files#diff-ae55fbbe68bef6a3e4bd052ea86c473adb44a91b440d2abb63a89ff1be198fe7 by Materials Project colleagues.
FYI: I didn't give up on this PR, I have some other things to finish first. Also, in the meantime I want to think about what would be the best way to proceed with this
All good! I also have this on the backburner to think about some more.
Some progress! I think I would like to take advantage of this merge request to fix a few things on ASE's side. Currently there are multiple problems:
- Some Molecular Dynamics classes simply can't restart correctly (NPT)
- During a restart, one configuration will always be calculated twice.
- No units convention.
This makes ASE simulations not (or hardly) reproducible, which is a big deal, in my opinion. Something that would greatly increase the reproducibility of ASE simulation's, and fix the problems above is to allow a verbose mode, where the state of each object involved is written to the disk in a json format (Calculator, Profile, Template, Dynamics) if applicable.
This json format would allow to instantly recreate the objects as well. What do you think?
Thanks for the update! I think improvements on the ASE side are always welcome. I will give this (and your other PRs, e.g. in ASE and here) a more detailed look later this week. Just got back from travel so am a bit behind.
This should be close to go, there is still something essential before being able to run MD correctly: restarts. They are still difficult to do (impossible in some cases), but that's an ASE problem. I suspect this will take ages to fix; for now I deleted everything related to restart while we wait to see what solutions will get accepted upstream (if any).
I looked at automate2, there are indeed some interesting things, like the possibility to specify a temperature/pressure gradient. I will charge back with my idea of "patches" to accomplish that, this would look like:
steps = np.arange(1000, 10000, 1)
temperatures = function_of_steps(steps)
my_temp_gradient = TemperatureGradient(steps, temperatures)
my_file_copier = FileCopier(interval = 10)
my_press_gradient = PressureGradient(...)
md_job(... patches = [my_temp_gradient, my_file_copier, my_pressure_gradient])
This is achievable by tuning the generator as you did previously (although it might get packed), for some reason I love this "patching" concept 😅. People could fine-tune their MD, picking what triggers when, Quacc could even have a section in the documentation explaining how to do complex things as we already discussed. Anyway, daydreaming here 🙃
Thanks for your persistence with this! I will try to look at this within the next few days (please ping me if I forget). I'm definitely interested to chat more about the patches idea here; perhaps within the context of MD, it could be an interesting proof-of-concept to think about. I'll revisit this when I have the time to give this a proper review!
Some news on this? Just to know if it is waiting on https://gitlab.com/ase/ase/-/merge_requests/3310
@tomdemeyere: Thank you for your enormous amount of patience with this PR. I didn't see your comment from 3 weeks ago, so my apologies about not responding and for keeping you in the dark about this.
The reason I had been reluctant to merge this was that there was a fair bit of refactoring I wanted to do to remove code duplication in the
run_md
method and also in the schema. This was dependent on some other PRs I had in the pipeline. I was finally able to devote time to do so properly, and now it is much cleaner. 👍 I didn't change any functionality (to the best of my knowledge).At this point, I am quite pleased with this. Whenever you have a moment, however, I have a few remaining questions outlined below. I am happy to take over of any (small) changes that result from my questions --- but I wanted to get your professional input before attempting to proceed since I am not an MD expert. Finally, before I merge, if you have a moment to look at a similar attempt recently implemented in matcalc, it'd be nice to know if we are missing anything particularly useful that's in there. For instance, there is something about upper triangular cells that is entirely foreign to me.
I had a look at the code you sent me and I am afraid I have never had issues with the upper vs lower triangular matrix (mainly because I don't often run NPT) but from what I understand this is taken care by a PR in ASE (https://gitlab.com/ase/ase/-/merge_requests/3277) by just modifying the NPT class without the need of modifying the atoms object like in matcalc.
For more context: left is my slab before, right is the slab after being passed in matcalc.
To refresh the current challenges in this PR:
- The lack of proper restart, which breaks the purpose of "MD workflows"
- Choose a convention for the units being used and make sure that it is clear and user-friendly
For the first point, I am afraid this is not for today :/. For the later I tried to make things as clean as I could but of course feel free to discard these suggestions and do as you like.
Thanks for the reply, @tomdemeyere! Regarding the triangular stuff, thanks for linking me to the MR. If this is just an ASE bug/inconvenience, we will take care of that upstream. I'll keep tabs on the MR.
As for this PR, not to worry about restarts. I don't think it is a dealbreaker, and I disagree that it defeats the purpose of a workflow. After all, one might want to do a non-MD job afterwards (e.g. a DFT relaxation) which would be fully supported. It's just that MD-->MD might be tricky. I wouldn't let that hold up this PR. Let me know if/when your upstream ASE MR though needs any attention from me.
The second point is really the only one I feel is necessary to make sure we get right. I trust your opinion on this and am content with leaving it as-is.
So, I think the plan (unless you disagree) should be:
- I'll monitor the conversation at https://gitlab.com/ase/ase/-/merge_requests/3391 to see if there's anything productive that might turn up in the short term.
- Assuming the above answer is no, I'll go ahead and merge this PR pretty much as-is sometime next week.
Thoughts?
Oh, but also I still had the question about fixcm
and fixrot
. Did we need to rename them for some reason? Was that also an internal consistency thing?
Oh, but also I still had the question about fixcm and fixrot. Did we need to rename them for some reason? Was that also an internal consistency thing?
Most of the time you will find fixcm
but occasionally you might find fix_com
(mostly internally) I prefer fix_com
so I used that (it is less obscure in my opinion) but I don't feel strongly about it, change it as you want.
Thoughts?
Yes! I think that's a great plan
FYI: I have seen all your comments, but I am moving right now. I will make sure to come back to it ASAP 😀 !
No worries, @tomdemeyere! I have been kind of going back and forth on a bunch of things, so I'm sorry about all the spam here. 😅 I should be done for now. Hopefully sometime later you can take a look and tell me what's a good or bad idea from here. I feel good where things are now but since I'm not an MD expert, I would really appreciate your insights (and clarity on a few minor points). Thanks for understanding. At this point, it'd probably be best if you look at the "changed files" tab and do a re-review.
I'm also moving next week!! Good luck with your move!
Okay, because this was such a mess on my part, I've tried to clean up my comments and will re-summarize them below:
- When you have a moment, could you please re-review the changed files and see if this is to your liking?
- In the
quacc.recipes.emt.md.md_job
recipe, I thought it would be nice to havetemperature
,pressure
, andtimestep
as dedicated keyword arguments like what is done here. This is because the parameters are so critical to running MD and because it will potentially help clarify that users should be playing by "our rules" in terms of units. I have made that change accordingly. What do you think? - Why do we have logs and handling for deprecated keyword arguments like
pressure
andtemperature
? Would it not be better to simply not accept them? Then there won't be any confusion about units. That was more-or-less my question from earlier. Instead of trying to be "smart", we could simply just not accept those old keywords at all.
- When you have a moment, could you please re-review the changed files and see if this is to your liking?
I think it's nice (I skipped the part where the runners have been changed to a class!) you changed fix_rot
and fix_com
to set_zero_rotation
and set_stationary
my only concern here is that it is not immediately clear what set_stationary
is, but with a little bit research this should be ok on the user side (zero_com_momentum
, zero_angular_momentum
might be better? I don't know)
- In the quacc.recipes.emt.md.md_job recipe, I thought it would be nice to have temperature, pressure, and timestep as dedicated keyword arguments like what is done here. This is because the parameters are so critical to running MD and because it will potentially help clarify that users should be playing by "our rules" in terms of units. I have made that change accordingly. What do you think?
When I initially tackled the problem I had a few points in mind:
- Convention/units choice has to be uniform and well-thought: Kelvin for temperature, femtosecond for time, GPa for pressure... All these units are widely used, and they are all powers of SI units, so Quacc somehow follow international conventions.
- Users must be able to understand quickly what's going on, despite ASE being a mess on this side, this is especially challenging since some ASE's MD use different units with similar keywords.
With that in mind I was planning to add Quacc specific keywords such as "temperature_K", "pressure_gpa", and "timestep_fs", so that it would be clear that Quacc follow a different convention than ASE. ASE base keywords would still be allowed inside the kwargs
. The problem with this approach is that it is not uniform; additional keywords (compressiblity, taut, etc) which are class specific are not integrated and will not follow the same convention, except if Quacc keeps track of every keywords, but that's a lot of burden on the dev side.
Instead, I went all-in by forcing every ASE keywords (deprecated or not...) to follow Quacc's new convention.
I have made that change accordingly. What do you think
Pressure might be a little bit tricky, because NPTBerendsen
expects a scalar while NPT (Nose-Hoover) can take both an array and scalar (externalstress
keyword), and is represented as stress (so minus the pressure), if you are fine with the keyword pressure
being used for multiple meaning, then it is fine.
- Why do we have logs and handling for deprecated keyword arguments like pressure and temperature? Would it not be better to simply not accept them? Then there won't be any confusion about units. That was more-or-less my question from earlier. Instead of trying to be "smart", we could simply just not accept those old keywords at all.
Yes, but again, you have to be careful because these keywords are not deprecated in some classes, for example temperature
and temperature_K
both seem to work for NVTBerendsen
.
EDIT: Also, the problem is that now you have pressure_au
that takes a pressure in GPa 🙃
Not sure if I am very helpful here, I think the current approach including your suggestions and changes is good. Although I must say I am still thinking about Quacc specific keywords, as mentioned above, as it would make it independent from what ASE is doing, at least for now.
@tomdemeyere: Thank you for your review of this! Apologies this has dragged on for so long. I recognize your time is valuable and so appreciate your comments here. I will address your comments below later this week. I think they are all logical. I will ping you (one last time? 😅) when that's done to keep you updated.
I think it's nice (I skipped the part where the runners have been changed to a class!) you changed fix_rot and fix_com to set_zero_rotation and set_stationary my only concern here is that it is not immediately clear what set_stationary is, but with a little bit research this should be ok on the user side (zero_com_momentum, zero_angular_momentum might be better? I don't know)
These are good suggestions. I changed set_stationary
to set_com_stationary
, which might help a little.
Instead, I went all-in by forcing every ASE keywords (deprecated or not...) to follow Quacc's new convention.
This is more-or-less my concern --- that an ASE expert will come in and expect that if they are requesting an ASE keyword that it will behave as one would expect from ASE.
I like your suggestion of quacc-specific keywords, but also it might not be totally necessary (debatable). In any case, I feel like we should have the recipe keyword arguments in the units that we want (clearly documented) and any internal ASE keywords passed by the user we just don't touch. We leave them as-is because the user is clearly familiar with ASE and knows what they're doing.
Pressure might be a little bit tricky, because NPTBerendsen expects a scalar while NPT (Nose-Hoover) can take both an array and scalar (externalstress keyword), and is represented as stress (so minus the pressure), if you are fine with the keyword pressure being used for multiple meaning, then it is fine.
It looks like you may have meant NPT
? If so and I'm understanding the documentation correctly, it does not look like this should be a major problem because NPT
accepts externalstress
but has no pressure
keyword argument. So, this is a scenario where pressure_gpa
(or whatever we name it) would not apply. NVTBerendesen
looks to have just a pressure_au
but not externalstress
, unless I am missing something.
EDIT: Also, the problem is that now you have pressure_au that takes a pressure in GPa 🙃
Thanks! Will fix!
@tomdemeyere: How does the proposal in https://github.com/tomdemeyere/quacc/pull/1 sound?