espresso icon indicating copy to clipboard operation
espresso copied to clipboard

MLIP and Water Tutorial

Open PythonFZ opened this issue 5 months ago • 8 comments

This PR adds the MLIP/Water Tutorial. It is spilt into three parts

  1. TIP4P Water by @hidekb
  2. MLIP Setup and evaluation by @PythonFZ
  3. MLIP MD by @hidekb

For testing until merged, you can use

git clone https://github.com/PythonFZ/espresso.git
cd espresso
git checkout mlip-water
cd doc/tutorials/mlip-water/
# use your favorite way of installation, e.g. uv sync / pip install . or pre-installed
dvc pull

PythonFZ avatar Sep 23 '25 08:09 PythonFZ

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Thank you! It looks very good.

A few points from an initial quick read:

Part 1

  • considr using GPU P3M, if a GPu is available.
  • maybe extend description of how the rigid body with Virtual Sites works

Part 2

  • For newcomers, some of the terminology might not be fully clear, e.g., hyperparameter, descriptor...
  • Maybe it can be clarified what "true energy" erfers to, i.e., make more explicit that this refers to the result from qm calcuations on which the model was trained
  • Eventhough the models are pre-trained, can one show some samples from the training data visually?
  • What is the role of zntrack?

Part 3

  • Maybem we can make it clear that this script in essence works for other MLIPs, as long as they provide an ASE calculator
  • Explain thaat ase.integrate() assignes to the particles' ext_force. This means that
    • this attribute is overwritten
    • but otherwise, all ESPResSo interactions can be used in combination with the ASe calculator

RudolfWeeber avatar Sep 23 '25 11:09 RudolfWeeber

I now had a look at the first part, this is looking very impressive to me (see my comments above)!

I just did a slightly longer (100x) run to test the pressure in the simulation, at least with the current friction coefficient (see my comment above) it seems reasonable: $$313 \pm 45$$ bar, which seems very plausible to me...

I will look into parts 2 and 3 tomorrow.

schlaicha avatar Sep 25 '25 19:09 schlaicha

In the ASE interface, we are setting the calculator in a loop.

for _ in tbar:
    atoms = system.ase.get()
    atoms.calc = box.calc

The .calc is a property (https://gitlab.com/ase/ase/-/blob/master/ase/atoms.py#L396) and there might be a performance consideration here.

PythonFZ avatar Sep 29 '25 06:09 PythonFZ

@PicoCentauri we are using the GMNN^1 as implemented in APAX^2

PythonFZ avatar Sep 29 '25 17:09 PythonFZ

Hi,

I had another look and all my points seem well adressed. I just pushed quite a few of typo corrections and didactic rephrasings. From my side this is finished and can be used as tutorial!

A few open points:

  • Concerning the CUDA support for P3M in the first parg suggested by @RudolfWeeber , the way it is implemented now always demands CUDA, so the code will fail without CUDA installed - I think this should be a switch checking if CUDA is available and then using it.

  • I don't understand the last paragraph in the third tutorial part,

    Because our MLIP learns all interactions, it can happen that minimum bond distances are not obeyed, (...)

    I guess you are refering to the data $$r_\text{max} = 2\text{\AA}?$$ Then I would simply conclude that such a small cutoff misses the fundamental interactions and therefore produces nonphysical results, leaving out the rest of the discussion.

    this can be mitigated by including short-range two-body

    why would you want to do that?

schlaicha avatar Oct 01 '25 13:10 schlaicha

I can shorten this, but I think it includes some valuable information

Because our MLIP learns all interactions, it can happen that minimum bond distances are not obeyed, leading to unphysical behavior in the simulation that are reflected in the RDF plots.

I think this highlights again, that there are e.g. no springs between particles and if not enough information is given, the system can blow up. Maybe including another sentence to explicitly refer to the 2 A data might help?

Although this can be mitigated by including short-range two-body potential terms, fundamentally, if such behavior occurs, the model parameters or training data should be improved.

Including the Ziegler-Biersack-Littmark (ZBL) short-range correction potential is common practice in many MLIPs but for our model, we have seen that including these correction terms might not necessarily improve the model accuracy.

If this is information is to coarse and doesn't really benefit the model, I'll shorten the section as you've suggested?

PythonFZ avatar Oct 02 '25 05:10 PythonFZ

If this is information is to coarse and doesn't really benefit the model, I'll shorten the section as you've suggested?

Yes, I think either you need to provide more explanations (like the one you provided above) or simplify the section...

schlaicha avatar Oct 05 '25 10:10 schlaicha