openff-interchange icon indicating copy to clipboard operation
openff-interchange copied to clipboard

Interchange.to_gromacs() creates a topology with far too many atomtypes, which influences GROMACS performance

Open pbuslaev opened this issue 10 months ago • 4 comments

Description When dumping the interchange to GROMACS format, too many atom types are created. This is completely fine in terms of compatibility with GROMACS, but it is causing huge memory pressure on GROMACS. The issue is that GROMACS is creating and keeping in memory the matrix of all non-bonded interactions. So the more atom types you have, the larger the table is. The table size (and the memory requirements) has $n^2$ dependence on the number of atom types. For me, a system of around 55,000 atoms led to memory usage over 32 GB at both grompp and mdrun stages.

To demonstrate the issue, I used the 102L structure. I removed all the waters manually with vmd and added hydrogens with gmx pdb2gmx:

gmx pdb2gmx -f 102L_clean.pdb -o 102L_h.pdb

When I ran grompp with the topology and structure created by GROMACS with amber99sb-ildn force field using the following basic .mdp file:

; minim.mdp - used as input into grompp to generate em.tpr
; Parameters describing what to do, when to stop and what to save
integrator  = steep         ; Algorithm (steep = steepest descent minimization)
emtol       = 1000.0        ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm
emstep      = 0.01          ; Minimization step size
nsteps      = 5000         ; Maximum number of (minimization) steps to perform

; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
nstlist         = 1         ; Frequency to update the neighbor list and long range forces
cutoff-scheme   = Verlet    ; Buffered neighbor searching
ns_type         = grid      ; Method to determine neighbor list (simple, grid)
coulombtype     = PME       ; Treatment of long range electrostatic interactions
rcoulomb        = 1.0       ; Short-range electrostatic cut-off
rvdw            = 1.0       ; Short-range Van der Waals cut-off
pbc             = xyz       ; Periodic Boundary Conditions in all 3 dimensions

I got:

# Execution command
time gmx grompp -f min.mdp -c 102L_h.gro -p topol.top -o test1.tpr -maxwarn 1

# Output
                      :-) GROMACS - gmx grompp, 2023.1 (-:

Executable:   /software/gromacs_v2023.1/bin/gmx
Data prefix:  /software/gromacs_v2023.1
Working dir:  /tmp_mnt/filer1/unix/pbuslaev/projects/openff_test_atomtypes/structures
Command line:
  gmx grompp -f min.mdp -c 102L_h.gro -p topol.top -o test1.tpr -maxwarn 1

Ignoring obsolete mdp entry 'ns_type'

NOTE 1 [file min.mdp]:
  With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
  that with the Verlet scheme, nstlist has no effect on the accuracy of
  your simulation.

Setting the LD random seed to 1605763007

# Note the number of combinations generated
Generated 2145 of the 2145 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5

# Note the execution time
real    0m0.164s
user    0m0.102s
sys     0m0.018s

The memory usage during the execution, obtained with

while true; do free >> memory.log; sleep 0.1; done

looks like this: image

When I created the system using interchange:

from openff.toolkit import Topology, ForceField
from openff.interchange import Interchange
p = Topology.from_pdb("102L_h.pdb")
ff = ForceField("ff14sb_off_impropers_0.0.3.offxml")
pi = Interchange.from_smirnoff(force_field = ff, topology = p)
pi.to_gromacs("102L_gmx")

and used grompp with the same .mdp file as before, I got the following output:

# Execution command
time gmx grompp -f min.mdp -c 102L_gmx.gro -p 102L_gmx.top -o test1.tpr -maxwarn 1

# Output
                      :-) GROMACS - gmx grompp, 2023.1 (-:

Executable:   /software/gromacs_v2023.1/bin/gmx
Data prefix:  /software/gromacs_v2023.1
Working dir:  /tmp_mnt/filer1/unix/pbuslaev/projects/openff_test_atomtypes/structures
Command line:
  gmx grompp -f min.mdp -c 102L_gmx.gro -p 102L_gmx.top -o test1.tpr -maxwarn 1

Ignoring obsolete mdp entry 'ns_type'

NOTE 1 [file min.mdp]:
  With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
  that with the Verlet scheme, nstlist has no effect on the accuracy of
  your simulation.

Setting the LD random seed to -46147591

# Note the number of combinations generated
Generated 3611328 of the 3611328 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5

# Note the execution time
real    0m3.448s
user    0m1.606s
sys     0m1.799s

The memory usage during the execution tracked the same way as for gmx looks like this: image

Note, that for the system created with interchange the size of non-bonded parameter combinations generated is 3611328 instead of 2145, and execution time is 3 seconds, instead of 0.2 seconds.

I propose the solution (the draft is in MR #962 ) which is merging similar (or identical) atom types into a single entry. With this solution, I was able to reduce the number of atom types combinations to 105 and execution time to 0.1 seconds.

pbuslaev avatar Apr 09 '24 16:04 pbuslaev

Will look at this and your changes more closely later but you're probably running into the long-standing problem of using SMIRNOFF in "conventional" atom-typed infrastructure, namely the atom type explosion problem. It's the sort of problem that seems like it should be easy to wrangle but doesn't always work out like that - with some differences in how engines handle each. I forget (but hope I can pull up!) some of the more antagonistic cases that prevented me from going all the way with atom type de-duplication in the past.

This is all to say your issue is surely valid, I'm not surprised you're running into some performance issues, and I've even tagged this as something to better document (#607).

mattwthompson avatar Apr 09 '24 16:04 mattwthompson

Following the discussion in #606 the behaviour suggested in #962 can be made optional, by adding a flag to Interchange.to_gromacs(combine_atomtypes: bool = False). This will keep the possibility to write all possible atomtypes if needed, but will also give the users the possibility to reduce the memory load. This solution also does not put any constraintes on the core Interchange code, only to one particular exporter. This flag will also provide a direct way to test the functionality.

pbuslaev avatar Apr 10 '24 10:04 pbuslaev

With #973, there's now a good-looking solution for this. I might keep this open while it gets some testing out in the wild, though.

mattwthompson avatar Apr 23 '24 21:04 mattwthompson

This might help as well https://manual.gromacs.org/2024.0/release-notes/2024/major/performance.html#reduced-grompp-and-mdrun-setup-time-for-systems-with-many-atom-types

mattwthompson avatar May 03 '24 20:05 mattwthompson