plumed2 icon indicating copy to clipboard operation
plumed2 copied to clipboard

Interface to metatensor to use arbitrary machine learning models as collective variables

Open Luthaf opened this issue 1 year ago • 1 comments

Description

This PR add an interface using metatensor atomistic models to execute arbitrary machine learning models, with the intended use case of using these models as collective variables in PLUMED.

The core idea of metatensor is to facilitate data sharing between different machine learning ecosystems by annotating data with metadata which describe what exactly is being passed around. This is done through the TensorMap class, which contains one of more TensorBlock, each block containing the actual values, and Labels describing the rows and columns of the values.

On top of this data format, we defined an interface for exporting trained ML models, and then loading and executing such models from a variety of simulation engines. The initial objective was to share models that would compute the energy and forces of a system and use these as ML force fields, but the same API can also be used for arbitrary quantities, including collective variables. These models are based on PyTorch for three reasons:

  • ability to define models in Python and execute them in C++ without a Python interpreter, through TorchScript;
  • ability to compute gradients with automatic differentiation;
  • ability to run calculations on GPU/CPU/... with almost the same code.

There is some documentation on the workflow to load and execute a metatensor model here for reference: https://docs.metatensor.org/latest/atomistic/overview.html#data-flow-between-the-model-and-engine. The main bits of data a model needs from PLUMED are the types for all particles (for atoms, this can be the atomic number), the positions of all particles, the simulation cell/box and (potentially many) neighbor lists for given cutoffs.

Difference with the existing PyTorch module

Both this module and the existing PyTorch module in PLUMED are based on PyTorch (for the same reasons!), the main difference as far as I know is that the PyTorch module is intended to act as transformation on top of CV computed by other PLUMED actions; which the metatensor module is intended to start with positions/atom types/... and compute a new CV from scratch. This enables using metatensor to compute some representation of the system (such as SOAP) and send it to PLUMED for further processing.

Unresolved questions/issues

The main question remaining for me is what to do regarding the neighbor lists calculation. I currently vendor code from https://github.com/Luthaf/vesin in the metatensor module, and it works pretty well. However, the code is just copy-pasted from another repository, and might need to be updated in the future; but does not 100% conform to the code style of the plumed repository. I can see a couple of solutions here, ordered from favorite to least favorite:

  • [ ] Disable style & code checks for this code in the repository. I managed to disable astyle, but I could not find a way to disable codecheck or headers.sh. Is there a mechanism to mark some files/directories as "external" code that should be ignored by such linters?
  • [ ] I could include the code as a .tar.gz archive which gets extracted in configure. This would make the process of updating the code a lot easier (just update the archive), however I don't know if this will help with removing the code form astyle/codecheck/... unless these scripts respect .gitignore.
  • [ ] I could also re-implement the neighbor list in terms of the PLUMED version. I was not aware of this code at the time, hence why I imported vesin instead. I would have to check that the behavior of the PLUMED NL matches what metatensor need and write some code adapting the data from one format to the other.

What do you think? Any other idea on what to do regarding "external" code in the PLUMED repository?


The other questions I have concerns the unit cell storage as returned by this->getPbc().getBox();:

  • [ ] for non-periodic systems, what's returned by this functions? A matrix of zeros, or the bounding box of the system? Does it depend from one simulation engine to another?
  • [ ] in all cases, the storage format seems to be using rows of the matrix to store the three box vectors, is this correct?

Type of contribution

  • [ ] changes to code or doc authored by PLUMED developers, or additions of code in the core or within the default modules
  • [ ] changes to a module not authored by you
  • [x] new module contribution or edit of a module authored by you

Copyright

  • [ ] I agree to transfer the copyright of the code I have written to the PLUMED developers or to the author of the code I am modifying.

  • [x] the module I added or modified contains a COPYRIGHT file with the correct license information. Code should be released under an open source license. I also used the command cd src && ./header.sh mymodulename in order to make sure the headers of the module are correct.

Tests

  • [x] I added a new regtest or modified an existing regtest to validate my changes.
  • [ ] I verified that all regtests are passed successfully on GitHub Actions. - As discussed with @gtribello, the regtest run in a separate repository (currently cloning plumed from the branch of this PR, I'll update to pull from the main repo once the code is merged): https://github.com/lab-cosmo/plumed-metatensor-tests

Luthaf avatar May 23 '24 11:05 Luthaf

Thanks @Luthaf !!! A few comments below:

  • could you please regenerate the configure file using autoconf 2.69? installing it should be straightforward (see here).
  • I noticed that there are tests that will likely be skipped on GitHub Actions (because metatensor is not configured there). You should mark this test as skippable as we did here, otherwise the test suite will complain

Then let's wait for the other checks that are running now. Thanks again!

GiovanniBussi avatar May 23 '24 12:05 GiovanniBussi

for non-periodic systems, what's returned by this functions? A matrix of zeros, or the bounding box of the system? Does it depend from one simulation engine to another?

It is expected to be a matrix of zeros. We only use the box to apply PBCs. But certainly it is possible that some code incorrectly passes a box even when there are no PBCs applied. In any case, within PLUMED it is correct to assume that having non zero numbers here imply PBCs.

in all cases, the storage format seems to be using rows of the matrix to store the three box vectors, is this correct?

Yes (in C order). So box[2][0] is the first component of the third vector.

GiovanniBussi avatar May 27 '24 13:05 GiovanniBussi

@Luthaf sorry for not merging this yet. Could you please confirm that I can do it? Thanks!!!

GiovanniBussi avatar Jun 27 '24 14:06 GiovanniBussi

If you are happy with the code, please do merge! I'll send further PR to improve it/update dependencies as needed.

Luthaf avatar Jun 27 '24 14:06 Luthaf

Thanks for your contribution!

GiovanniBussi avatar Jun 27 '24 14:06 GiovanniBussi

And thanks for your review!

Luthaf avatar Jun 27 '24 14:06 Luthaf