atomate2 [WIP] Generalize forcefields for generic ASE calculator support

This is a refactor to better support generic ASE calculators in atomate2. Major changes:

schemas / utils have been ported from the forcefields library to a separate ase library
Forcefield-specific jobs now inherit from ASE base classes, such as the ForceFieldTaskDocument from the base AseTaskDocument class
Example ASE jobs using the Lennard-Jones and GFNn-xTB (TBLite) calculators
The ForcefieldResult and ForcefieldObject classes alias ASE parent classes + deprecation warning

Most of the other moved classes are internals for atomate2, so I don't expect many important consequences there

To do:

Replicating all "standard" flows (elastic, eos, gruneisen, phonon) with a base ASE calculator class?
Tests for the ASE calculators currently in atomate2.ase.jobs (having some trouble getting TBLite working on an M1 mac)

Aug 02 '24 22:08 esoteric-ephemera

@esoteric-ephemera Great! Even though the outside usage of ForcefieldResult might be limited. Let's maybe still alias - just in case.

Aug 03 '24 04:08 JaGeo

Hi @JaGeo, restored the ForcefieldObejct and ForcefieldResult objects for now with a deprecation warning

Aug 05 '24 18:08 esoteric-ephemera

Hi @esoteric-ephemera, this is really nice to see in atomate2. Your implementation looks great to me. Would you be also happy to add static makers?

In terms of your todos:

Replicating all "standard" flows (elastic, eos, gruneisen, phonon) with a base ASE calculator class?

I'm not sure if this makes sense? We'd still need to make subclasses for specific implementations anyway, so I don't think a base implementation would add anything?

Tests for the ASE calculators currently in atomate2.ase.jobs (having some trouble getting TBLite working on an M1 Mac)

This would be great if you can figure it out.

Aug 06 '24 12:08 utf

I'm not sure if this makes sense? We'd still need to make subclasses for specific implementations anyway, so I don't think a base implementation would add anything?

Sounds good, I'll just port the MD maker over to ase from forcefields and call it

Static makers have been added + tests

Aug 07 '24 04:08 esoteric-ephemera

Right now, I'm getting completely different behavior from tblite when running in CI vs. running locally (as in SCF doesn't converge in CI, works completely fine locally). I'm not sure if it's due to the pytorch-tblite incompatibility - see here for example.

For now, I've made the tblite dependence optional but left the relax / static / MD jobs + tests there. Happy to remove if preferable but I think it's good to have basic implementations of these in atomate2

Aug 12 '24 17:08 esoteric-ephemera

Thanks @esoteric-ephemera. Would it be much effort to mock the tblite call so it doesn't have to run in CI? I'll also tag @Andrew-S-Rosen as maybe he has got tblite working for the quacc CI?

Eitherway, if this won't be possible, I'm happy to merge as is.

Aug 14 '24 08:08 utf

It's due to the PyTorch incompatability. Not much you can do there except to split up the tests so you don't have both packages installed in the same env.

Aug 14 '24 12:08 Andrew-S-Rosen

@Andrew-S-Rosen I feel this might be a general problem in the future, especially with all the force fields🫠

@utf maybe, we should think about installing and testing the forcefields separately as well if further conflicts show up.

Aug 14 '24 12:08 JaGeo

Good suggestions, I split off the tests and everything (incl. tblite) seems to be passing now. Not sure that the way I split the tests makes the most sense (like do the code-cov reports automatically merge?)

Aug 14 '24 22:08 esoteric-ephemera

Is this still WIP or can it be merged now?

Aug 20 '24 09:08 utf

@utf should be ready for review!

Aug 20 '24 16:08 esoteric-ephemera

@esoteric-ephemera @utf what's the timeline for merging this PR? with latest numpy, scipy and pymatgen (https://github.com/materialsproject/pymatgen/pull/4009) now being 3.10+, would make sense imo for atomate2 to follow suit. given the size of this PR, that would likely result in merge conflicts so better the get this into main first

Sep 07 '24 16:09 janosh

question from looking at the changes in this PR: is the long-term goal to prevent further proliferation of Makers for each new MLFF that comes along (worthy goal imo)? having a single ASE(MD|Static|Relax)Maker which take a model name and a dict of model specific kwargs or (maybe simpler still) just a ready-to-go ASE-compatible Calculator seems much more maintainable. if that's the goal, i think this PR should deprecate CHGNetRelaxMaker and friends

https://github.com/materialsproject/atomate2/blob/808de7da064c488511326256552c3a7076f58d73/src/atomate2/forcefields/jobs.py#L202

Sep 07 '24 16:09 janosh

@janosh I would be happy about a deprecation period in such a case. We are using these Makers quite a lot and adapting codes/documentation will take time.

Sep 07 '24 16:09 JaGeo

@esoteric-ephemera @utf what's the timeline for merging this PR?

This is ready to go as is. I'd like to add NEB workflows to this (started for VASP here) if I had a few more days, but no need if we'd like to merge sooner.

is the long-term goal to prevent further proliferation of Makers for each new MLFF that comes along (worthy goal imo)? having a single ASE(MD|Static|Relax)Maker which take a model name and a dict of model specific kwargs or (maybe simpler still) just a ready-to-go ASE-compatible Calculator seems much more maintainable.

Yes exactly! The Calculator solution (+ model kwargs) is implemented in this PR. Having a LibXC-style lookup for forcefields is worthy of its own project, and is probably covered by MatGL for the graph-based MLFFs right?

It's early enough that we could deprecate the existing MLFF makers but I don't see a strong reason to because they're in high use, as @JaGeo said. Think it's better to focus on preventing future bloat with more MLFF-specific calculators.

Sep 09 '24 17:09 esoteric-ephemera

@esoteric-ephemera @janosh I am fine with deprecation. I would just prefer a deprecation period in this case.

Sep 09 '24 17:09 JaGeo

Having a LibXC-style lookup for forcefields is worthy of its own project, and is probably covered by MatGL for the graph-based MLFFs right?

i don't think so. matGL has chgnet and m3gnet but is unlikely to implement new models like 7net, ORB, Nequip, etc. at the rate they appear. many of these models have their own code bases. MatGL afaik aims to implement models with a consistent API internally, not act as as registry mapping model names to implementations in external code bases. but that's all we need and it already exists in atomate2 in the function ase_calculator

https://github.com/materialsproject/atomate2/blob/fdd2c9e2eed3f1fc631a3061f84d93e0898b2347/src/atomate2/forcefields/utils.py#L386-L458

I don't see a strong reason to because they're in high use

the strong reason imo would be to get a leaner/cleaner atomate2 code base and offer a single idiomatic way of using MLFFs through atomate2. to quote "The Zen of Python":

“There should be one—and preferably only one—obvious way to do it.”

Sep 09 '24 17:09 janosh

@janosh Ok! I see now - so yes we'll still maintain ase_calculator and add models to that as needed, but we won't necessarily make subclasses where the MLFF is pre-assigned. That's totally OK with me, let me get the deprecation warnings in there

Also re: NEB, do we want to wait on that for now or push forwards? It could easily go in a separate NEB-related PR so no rush on that

Sep 09 '24 18:09 esoteric-ephemera

It could easily go in a separate NEB-related PR

i think a separate PR makes a lot of sense

Sep 09 '24 18:09 janosh

Sounds good on NEB. I've added deprecation warnings to all the classes, and incorporated a suggestion from @orionarcher about the force_field_name attr of the static, relax and MD makers: the user can input these as either str(MLFF.<attr>) (e.g., "MLFF.CHGNet"), MLFF.<attr> as an enum (MLFF.CHGNet), or the forcefield name as a string (e.g., "CHGNet"). Makes sense to have that now since we're deprecating these classes, and it's easier for the user this way

Sep 09 '24 18:09 esoteric-ephemera

Codecov Report

Attention: Patch coverage is 83.83234% with 108 lines in your changes missing coverage. Please review.

Project coverage is 19.36%. Comparing base (8d57884) to head (763bcc6). Report is 20 commits behind head on main.

Files with missing lines	Patch %	Lines
src/atomate2/ase/md.py	83.14%	20 Missing and 10 partials :warning:
src/atomate2/ase/utils.py	82.50%	16 Missing and 12 partials :warning:
src/atomate2/ase/schemas.py	85.51%	17 Missing and 4 partials :warning:
src/atomate2/ase/jobs.py	75.00%	15 Missing and 3 partials :warning:
src/atomate2/forcefields/jobs.py	92.45%	2 Missing and 2 partials :warning:
src/atomate2/forcefields/md.py	87.09%	2 Missing and 2 partials :warning:
src/atomate2/forcefields/schemas.py	92.30%	2 Missing :warning:
src/atomate2/forcefields/utils.py	66.66%	1 Missing :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #940       +/-   ##
===========================================
- Coverage   75.70%   19.36%   -56.35%     
===========================================
  Files         147      160       +13     
  Lines       10925    11732      +807     
  Branches     1613     1734      +121     
===========================================
- Hits         8271     2272     -5999     
- Misses       2173     9319     +7146     
+ Partials      481      141      -340

Files with missing lines	Coverage Δ
src/atomate2/forcefields/utils.py	`90.00% <66.66%> (-1.63%)`	:arrow_down:
src/atomate2/forcefields/schemas.py	`94.44% <92.30%> (-4.41%)`	:arrow_down:
src/atomate2/forcefields/jobs.py	`93.26% <92.45%> (-0.42%)`	:arrow_down:
src/atomate2/forcefields/md.py	`83.82% <87.09%> (-0.84%)`	:arrow_down:
src/atomate2/ase/jobs.py	`75.00% <75.00%> (ø)`
src/atomate2/ase/schemas.py	`85.51% <85.51%> (ø)`
src/atomate2/ase/utils.py	`82.50% <82.50%> (ø)`
src/atomate2/ase/md.py	`83.14% <83.14%> (ø)`

... and 116 files with indirect coverage changes

Sep 09 '24 23:09 codecov[bot]

@esoteric-ephemera might be cleanest and fastest to branch off a pytest-split branch from this PR and open that in a separate PR, then git reset --hard to back out the splitting commits from here so we can review the force field overhaul and CI splitting stuff separately

Sep 16 '24 18:09 janosh

@janosh sure happy to do that - the tests were failing just because I forgot to ignore ase tests in the main tests. Would you rather the pytest stuff go in a separate PR?

Sep 16 '24 20:09 esoteric-ephemera

would give better git history but i'm happy either way (if it saves time)

Sep 16 '24 20:09 janosh

Hi @esoteric-ephemera. Thanks again for this. Is it ready to merge?

Sep 24 '24 14:09 utf

Hi @utf should be ready to merge ~provided the tests pass (need to make sure these still run OK after resolving merge conflicts with OpenMM)~

Sep 24 '24 16:09 esoteric-ephemera

Thanks @esoteric-ephemera, this is very exciting to have!

Sep 25 '24 14:09 utf

In that context: should we document this and a few other of our new workflows that are not VASP-based more clearly? I think a lot of people have interest in this general interface but might not discover it easily

Sep 25 '24 14:09 JaGeo

@JaGeo agreed, this should probably go in the docs for clarity. Also happy to add to the working paper

Sep 25 '24 16:09 esoteric-ephemera

@JaGeo agreed, this should probably go in the docs for clarity. Also happy to add to the working paper

Personally, I think it should be also in the paper. Thoughts, @utf ?

Sep 25 '24 16:09 JaGeo