Molly.jl Roadmap and ideas for Molly.jl development

This issue is a roadmap for Molly.jl development. Feel free to discuss things here or submit a PR. Bear in mind that significant refactoring will probably occur as the package develops.

Low hanging fruit

Want to get involved? These issues might be the place to start:

Speedups to the code. Both on the level of individual functions and the overall algorithm. [Getting there]
Look over the code to find mistakes in implementation. There will be some because...
Test coverage is minimal. We need tests for the 'core' functions that are unlikely to change their return values, e.g. the force functions. [Done]

Bigger things to tackle

Energy minimisation by gradient descent [Done].
Add other forcefields. [Partly done]
Add other thermostats/ensembles. [Partly done]
Get it working on the GPU. [Done]
Line by line checking of how a mature MD package works to see where we have missed things and to get an idea of how to organise the codebase in future. [Partly done]
Neighbour list in a cubic lattice, and use of group lists. [Partly done]
Setup of MD simulations, e.g. adding hydrogen, solvent. [Partly done]
Trajectory/topology file format readers and writers. The current topology reader is not very robust. [Partly done]

Projects that will require more planning or will cause significant changes

Abstracting the forcefield from the integration to allow different forcefields and integrators to be combined. [Done]
Allow definition of custom forcefields. [Done]
Parallelisation. [Done]

Okay, so this is less of a roadmap and more of a list of ideas to work on. But most or all of the above would be required to make this a package suitable for general use, so in a sense this is the roadmap.

Oct 29 '18 23:10 jgreener64

To do: Add module for calculating Free Energy via Alchemical change ;)

Jan 16 '19 20:01 Zarasthustra

"Abstracting the forcefield from the integration to allow different forcefields and integrators to be combined" and "Allow definition of custom forcefields" are now done.

Jul 10 '19 10:07 jgreener64

It may be interesting to see whether you can either directly use https://github.com/libAtoms/JuLIP.jl as a library (also https://github.com/libAtoms/NeighbourLists.jl), or at least be inspired by some of the code therein.

Aug 14 '19 12:08 jarvist

Thanks Jarvist, I have had a brief look at the code there but will dive a bit deeper.

Aug 14 '19 13:08 jgreener64

Awhile ago I wrote some Julia code for EWALDS. It is verified for energy calculations against a NIST database.

https://www.nist.gov/mml/csd/chemical-informatics-research-group/spce-water-reference-calculations-10%C3%A5-cutoff [https://www.nist.gov/sites/default/files/styles/480_x_480_limit/public/images/mml/csd/informatics_research/spce_schematic_3.jpg?itok=OTb11Rb_]https://www.nist.gov/mml/csd/chemical-informatics-research-group/spce-water-reference-calculations-10%C3%A5-cutoff SPC/E Water Reference Calculations - 10Å cutoff | NISThttps://www.nist.gov/mml/csd/chemical-informatics-research-group/spce-water-reference-calculations-10%C3%A5-cutoff In this section, we provide sample configurations of SPC/E Water molecules[1] and report the various energy and force calculations for those configurations, where the coulombic contributions are computed using the traditional Ewald Summation Method[2]. These sample configurations and reference ... www.nist.gov

Would you be interested in it? I have not taken its derivative yet to calculate force. It has been on my list of things to do, but I just don't seem to have time :S

I have also added Rattle to my code for bond constraints... it is fairly straight forward too. Once your code has EWALD and Rattle, it will be a "full" code in my opinion. You would probably want to upgrade it to particle mesh ewald for MD, but that is purely performance.

Braden

From: Joe Greener [email protected] Sent: Wednesday, August 14, 2019 9:00 AM To: jgreener64/Molly.jl [email protected] Cc: Braden Kelly [email protected]; Comment [email protected] Subject: Re: [jgreener64/Molly.jl] Roadmap and ideas for Molly.jl development (#2)

Thanks Jarvist, I have had a brief look at the code there but will dive a bit deeper.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/jgreener64/Molly.jl/issues/2?email_source=notifications&email_token=AECO7UJ4R725SDIJE6HZFD3QEP6XBA5CNFSM4GADVQT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4IWZUA#issuecomment-521235664, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AECO7UOWNMZMPDMOU6A5VMDQEP6XBANCNFSM4GADVQTQ.

Aug 14 '19 13:08 Zarasthustra

Would you be interested in it?

Yes, definitely. If you could open a PR when you get a chance that would be great. I understand how it's hard to find time though, I only seem to get a few hours a month on this package these days.

Aug 14 '19 15:08 jgreener64

There is a Julia interface to Chemfiles - building Molly on top of those types would allow easy reading and writing.

Feb 20 '20 14:02 jgreener64

what are your thoughts on GPU acceleration or at least multi-core in CPU? In python it's really easy with joblib, cupy, jax... Don't know about the julia set of tools for that but might be worth an exploration

Feb 26 '20 18:02 hypnopump

GPU acceleration is feasible using CuArrays.jl - it would probably involve refactoring into a style that uses broadcasting rather than for loops, but that's no bad thing anyway.

Multi-core CPU support has improved greatly in the recent Julia v1.3 release so this is another avenue to explore, though I personally don't have much experience there.

Another point to add here is the ability to differentiate through molecular simulations using Zygote.jl, which is an exciting use case and something that Julia is well-suited for.

Feb 27 '20 10:02 jgreener64

I was unfortunately unaware of this package until now. (Or maybe I saw it and have forgotten...) Anyhow, I wonder whether there is any chance to combine Molly and JuLIP, or at least make them compatible.

With JuLIP I deliberately stayed quite close to ASE, to make sure that I can have a relatively thin interface. But this has led to some design decision that I‘ve since regretted. So in principle at least I‘m quite open to restructuring JuLIP to make it work with Molly.

Another possibility would be to make JuLIP really library for potentials, but let Molly focus on managing configurations, MD, geometry optimisation, etc ...

But maybe it is easiest to leave them separate and not worry too much about this. I‘m curious though what people think.

Mar 28 '20 04:03 cortner

Regarding the neighbourlist - this is surely something that we can share. I believe my list is fairly fast, it is based on matscipy. The main downside for Molly is that it uses the same cutoff for all atoms. It would need to be generalised so that different atoms can have a different cutoff.

I‘ve also started playing around with implementing various iterators over the neighbourlist to have a more functional style in implementing potentials. I‘m less sure how well that worked.

Mar 28 '20 13:03 cortner

Thanks for commenting @cortner. Once I get some time for this package, diving into JuLIP is definitely on the list of priorities to see what can be combined. Certainly with things you have optimised, like the neighbour list, there is room for Molly to use or build on that.

Apr 02 '20 16:04 jgreener64

or extract shared functionality so that JuLIP and Molly become effectively frontends...

Apr 02 '20 19:04 cortner

Hi! I observed some things in your code which could be optimized or written the "julia way".

Use StaticArrays to store positions and velocities, instead of your self-defined mutable structs. This might speed things up quite a lot due to allowing SIMD.
Implement a dist(a,b,box) function specialized for static arrays which takes the boundary conditions into account.
r2 = dx*dx+... can then be written as r^2. The pow function has specializations for 1,2 and 3, so this produces no overhead compared to C.
If you're using Atom/Juno you can use the Juno.@profiler to find bottlenecks
There is a discourse thread about "struct of arrays vs array of structs". I dont remember who won, but maybe it's instructive to read.
@inline the update_accelerations! functions
Use structs with parametric types, so that you can easily switch to e.g Float32 for GPUs. (SVectors are parametric.)
In theory, just an idea, one could leave the time-integration to another package like DifferentialEquations. They provide a whole lot of explicit or implicit solvers (but velocity verlet is missing I think) with adaptable or fixed time steps.

Kind regards, Max

May 07 '20 16:05 AlfTetzlaff

Some of this I do in JuLIP, though not yet as well as I’d like, it needs more cleanup to get proper type-agnostic code eg. - Another reason maybe to join forces at some pointZ

May 07 '20 16:05 cortner

Thanks for the feedback @AlfTetzlaff, I would definitely like to get some of this in. I have tried a few of the suggestions previously, for example the positions and velocities are currently mutable static vectors (sub-types of FieldVector) which should hopefully be giving some of the speedups you talk about.

The types should be made parametric and it would help with GPU compatibility, though at the minute the code is not written in a GPU-friendly way so that would need to change first. I would like to use the DifferentialEquations integrators in the long-term since they make so many available, but for the proof-of-concept here I just implemented a simple velocity Verlet.

I have opened a Discourse post with a specific MWE based on the current non-bonded force code to solicit feedback - please feel free to comment more there.

May 08 '20 21:05 jgreener64

I did a whole bunch of timing on many different possible LJ calculations for force and at the end of it, using SVector was the fastest.

Also, simple things like image separation could be done in a vectorized for loop rather than sequentially, and there was some time saving there too. Not enough to restructure your code just for that, but none the less, some.

I have put my Molecular Dynamics coding on hold and have started doing Monte Carlo (I just like MC better, and find statistical mechanics more interesting).

So far my MC code can do rigid SPC/E water. So, somewhat of a toy system as well, but, It can do Wolf summation or Ewalds for the electrostatic. There is nothing toy about Ewalds. Once I have it cleaned up, I will upload it and share with you guys. The most important thing is that it uses OffsetArrays.jl which is a wonderful package. It allows indices to take on any value, not just 1 and up. This just makes the fourier piece nice since the lattice vectors range from -kmax:kmax.

Julia does Ewalds really well. The fourier part is bloody fast and the "hard" part if there is one. The real part is simple and similar to a bare coulomb calculation.

I am also planning on making some ewald unit tests for energy. Force would also be nice, but that is in the future.

So, now that I am back part time on the Julia wagon, I will try the ewald contributions to you guys soon.

NOTE: -------------------------------------- Here is some fortran code that does force ewald force routines. It will take a good day to go over what they did and figure out how to make it work, which is why I have been putting off the force calculation and stuck with the energy.

http://zeolites.cqe.northwestern.edu/Music/electrostatic/ewald.htm zeolites.cqe.northwestern.eduhttp://zeolites.cqe.northwestern.edu/Music/electrostatic/ewald.htm This modules contains routines for those nasty ewald summations.! Good reference for ewald: Allen and Tildesley, Computer Simulation of ! Liquids, p 155-162; Amit's notes for doing part of the calculation in the zeolites.cqe.northwestern.edu

Braden

From: Joe Greener [email protected] Sent: Friday, May 8, 2020 5:47 PM To: jgreener64/Molly.jl [email protected] Cc: Braden Kelly [email protected]; Comment [email protected] Subject: Re: [jgreener64/Molly.jl] Roadmap and ideas for Molly.jl development (#2)

CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to [email protected]

Thanks for the feedback @AlfTetzlaffhttps://github.com/AlfTetzlaff, I would definitely like to get some of this in. I have tried a few of the suggestions previously, for example the positions and velocities are currently mutable static vectors (sub-types of FieldVector) which should hopefully be giving some of the speedups you talk about.

The types should be made parametric and it would help with GPU compatibility, though at the minute the code is not written in a GPU-friendly way so that would need to change first. I would like to use the DifferentialEquations integrators in the long-term since they make so many available, but for the proof-of-concept here I just implemented a simple velocity Verlet.

I have opened a Discourse posthttps://discourse.julialang.org/t/help-make-my-ideal-gas-simulation-ideally-fast/39141 with a specific MWE based on the current non-bonded force code to solicit feedback - please feel free to comment more there.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/jgreener64/Molly.jl/issues/2#issuecomment-626035133, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AECO7UOAPLRP4FM36MI4VDLRQR4WBANCNFSM4GADVQTQ.

May 11 '20 19:05 Zarasthustra

Great, thanks for the input.

May 11 '20 22:05 jgreener64

Here's a sneak preview of some non-bonded force optimisations I've been working on: faster single-CPU code, a multithreaded CPU implementation and a GPU implementation.

Hoping to refactor the package to allow these from the same API within a couple of weeks.

May 11 '20 22:05 jgreener64

Could it be that you confused us with ms in the above graph by accident?

May 15 '20 14:05 AlfTetzlaff

It's ms but for a 1000-step simulation, so it would be us per step. The blue dot for 100 atoms is the simulation benchmark time from the Discourse post.

May 15 '20 14:05 jgreener64

Some changes I added this weekend:

SVectors for coordinates and velocities, allowing arbitrary simulation dimensions.
Multithreading of non-bonded force calculations.

May 17 '20 16:05 jgreener64

Cool! Have you noticed any single threaded speedup due to SVectors?

May 18 '20 11:05 AlfTetzlaff

Yes, the Discourse thread benchmark went from 45 ms to 36 ms using SVectors, rather than the mutable FieldVectors used before.

Making Simulation immutable also sped things up.

The cell neighbour list would be a big algorithmic speedup (as discussed above).

May 18 '20 11:05 jgreener64

My rough feature list for a v0.1 release is:

Cell neighbour list [Done].
Type parameterisation (to allow Float32). [Done]
A few more thermostats [Done].
A few more simulators, e.g. minimisation/Langevin dynamics. [Done]
Some unit tests. [Done]
More thorough docs and docstrings. [Done]

May 18 '20 11:05 jgreener64

May I have your thoughts on the parallelization via domain decomposition in Julia?

Jan 13 '22 04:01 Chenghao-Wu

It's not really my area of expertise, but I guess it can be done and it looks like it could be a useful algorithm. I don't know how well that type of simulation would work in the Molly framework currently though.

Jan 13 '22 15:01 jgreener64

It would be an interesting proposition and a small step towards replacing lammps

Jan 13 '22 15:01 cortner

I thought I'd just update this issue with the state of the project.

In the last few months Molly has been under heavy development and has seen many new features including Unitful.jl support, more interactions, an OpenMM XML force field reader, AtomsBase.jl integration, differentiable simulation with Zygote.jl on CPU and GPU, a new general interaction type, implicit solvation models, more simulators, energy minimisation, more analysis functions and better test coverage. See the release notes for more details.

Future development directions include:

Optimisation of the GPU code path to make performance competitive to related software. https://github.com/JuliaMolSim/Molly.jl/issues/60 discusses one way to do this, but any approach will probably use GPU kernels. The challenge is to make the kernels work with gradients and to maintain flexibility while allowing the specialisations required for high performance.
More simulators such as replica exchange MD and Langevin variants [done].
Support for constrained bond lengths and angles, which are used in most macromolecular simulations these days.
More options and testing for simulation setup including residue template matching.
Pressure coupling methods for NPT simulation [done].
Support for non-cubic boundary conditions and no periodic boundaries in given dimensions [done].
Particle mesh Ewald for long-range summation of electrostatic interactions.
Parallel domain decomposition, or at least better use of CPU parallelism.

A few groups are interested in using Molly for a diverse range of simulations, so the intention is that it will stay general enough to be widely useful whilst also reaching high performance. My personal research uses Molly for differentiable simulations of proteins, so this will continue to dictate the features I work on.

Apr 16 '22 14:04 jgreener64

May be of interest a native Julia (as in no-external-dependencies) .xtc reader and writer? It is twice as fast than c implementation used by mdtraj.

Jul 18 '22 05:07 uliano

Molly.jl Molly.jl copied to clipboard

Roadmap and ideas for Molly.jl development

Low hanging fruit

Bigger things to tackle

Projects that will require more planning or will cause significant changes

Molly.jl
Molly.jl copied to clipboard