Molly.jl
Molly.jl copied to clipboard
open an sdf file
It is not clear to me how I can generally set up a working system. As a simple example, it is not possible to set up a system starting from an sdf file:
CT1000292221
3 2 0 0 0 999 V2000
0.0021 -0.0041 0.0020 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.0110 0.9628 0.0073 O 0 0 0 0 0 0 0 0 0 0 0 0
0.8669 1.3681 0.0011 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
M END
$$$$
System("water.sdf",ff)
straight up fails.
Is there somewhere an example where I start with a practical system (say a compound from https://www.rcsb.org ) and set up an MD simulation?
We use Chemfiles for the file reading and residues are not currently supported there for SDF (https://chemfiles.org/chemfiles/latest/formats.html#list-of-supported-formats), so the setup fails.
If you want to read in data from the PDB you can use PDB format, see the example at https://juliamolsim.github.io/Molly.jl/stable/docs/#Simulating-a-protein, though currently atom names must exactly match the residue templates. This is a topic of future work.
And with a PDB file, how can I make the atom names match exactly while also indicating that multiple hydrogen atoms are connected to the same molecule? For example, if we follow the last example from https://www.biostat.jhsph.edu/~iruczins/teaching/260.655/links/pdbformat.pdf we see that certain hydrogen atoms will be named as "1HG1". I take it that's not supported, and I should manually place the atoms and fix up the topology? Is there yet another format that is better supported? Do hydrogen atoms follow a different convention in molly.jl?
The residue template in the force field xml file defines the atom names (which currently must be matched exactly) and the connectivity between them. For example for alanine in ff99SBildn.xml
:
<Residue name="ALA">
<Atom name="N" type="N" charge="-0.4157"/>
<Atom name="H" type="H" charge="0.2719"/>
<Atom name="CA" type="CT" charge="0.0337"/>
<Atom name="HA" type="H1" charge="0.0823"/>
<Atom name="CB" type="CT" charge="-0.1825"/>
<Atom name="HB1" type="HC" charge="0.0603"/>
<Atom name="HB2" type="HC" charge="0.0603"/>
<Atom name="HB3" type="HC" charge="0.0603"/>
<Atom name="C" type="C" charge="0.5973"/>
<Atom name="O" type="O" charge="-0.5679"/>
<Bond atomName1="N" atomName2="H"/>
<Bond atomName1="N" atomName2="CA"/>
<Bond atomName1="CA" atomName2="HA"/>
<Bond atomName1="CA" atomName2="CB"/>
<Bond atomName1="CA" atomName2="C"/>
<Bond atomName1="CB" atomName2="HB1"/>
<Bond atomName1="CB" atomName2="HB2"/>
<Bond atomName1="CB" atomName2="HB3"/>
<Bond atomName1="C" atomName2="O"/>
<ExternalBond atomName="N"/>
<ExternalBond atomName="C"/>
</Residue>
The PDB file would have to have ALA as the residue name and every atom name appearing exactly once for a residue to match this template. Future work will allow flexibility in residue names and atom names by searching for the best template.
At the minute you are probably best off removing all hydrogens and adding them back with OpenMM, which will give atom names consistent with OpenMM XML force field files. All residues will need a template available, though that is the case with most software.
I think I correctly implemented alanine for use with ff99
ATOM 1 H1 ACE 1 2.000 1.000 -0.000
ATOM 2 CH3 ACE 1 2.000 2.090 0.000
ATOM 3 H2 ACE 1 1.486 2.454 0.890
ATOM 4 H3 ACE 1 1.486 2.454 -0.890
ATOM 5 C ACE 1 3.427 2.641 -0.000
ATOM 6 O ACE 1 4.391 1.877 -0.000
ATOM 7 N ALA 2 3.555 3.970 -0.000
ATOM 8 H ALA 2 2.733 4.556 -0.000
ATOM 9 CA ALA 2 4.853 4.614 -0.000
ATOM 10 HA ALA 2 5.408 4.316 0.890
ATOM 11 CB ALA 2 5.661 4.221 -1.232
ATOM 12 HB1 ALA 2 5.123 4.521 -2.131
ATOM 13 HB2 ALA 2 6.630 4.719 -1.206
ATOM 14 HB3 ALA 2 5.809 3.141 -1.241
ATOM 15 C ALA 2 4.713 6.129 0.000
ATOM 16 O ALA 2 3.601 6.653 0.000
ATOM 17 N NME 3 5.846 6.835 0.000
ATOM 18 H NME 3 6.737 6.359 -0.000
ATOM 19 CH3 NME 3 5.846 8.284 0.000
ATOM 20 HH31 NME 3 4.819 8.648 0.000
ATOM 21 HH32 NME 3 6.360 8.648 0.890
ATOM 22 HH33 NME 3 6.360 8.648 -0.890
TER
END
However, when I create the system and look at the interactions:
sys = System("alanine.pdb",ff,boundary = CubicBoundary(4.0u"nm"), loggers=(coords=CoordinateLogger(50),),rename_terminal_res = false)
then I only find sys.specific_inter_lists
to contain interactions between atoms in the ALA group. Shouldn't there also be a harmonicbond force between for example the C and O atoms in the ACE group?
Yes there should, I think this a problem with Chemfiles only reading bonds for non-standard residues when there is a CONECT record in the PDB file. I can add a mention to the docs, but in general non-standard residues (including ACE/NME terminal caps) aren't tested yet.
See https://github.com/noeblassel/SINEQSummerSchool2023/blob/main/notebooks/dipeptide_nowater.pdb for an alanine dipeptide that reads in okay:
using Molly
ff_dir = joinpath(dirname(pathof(Molly)), "..", "data", "force_fields")
ff = MolecularForceField(joinpath.(ff_dir, ["ff99SBildn.xml", "tip3p_standard.xml"])...)
sys = System("dipeptide_nowater.pdb", ff; rename_terminal_res=false)
One more issue:
using Molly
ff = MolecularForceField("ff14SB.xml");
sys = System("big_alanine.pdb",ff,boundary = CubicBoundary(1000.0u"nm"),rename_terminal_res = false)
total_energy(sys,find_neighbors(sys))
prints 1128.45113918646 kJ mol⁻¹
the python equivalent
import openmm
from openmm import app, unit
forcefield = app.ForceField('amber/ff14SB.xml')
f = app.PDBFile('big_alanine.pdb')
system = forcefield.createSystem(f.getTopology(), nonbondedMethod=app.NoCutoff)
integrator = openmm.LangevinMiddleIntegrator(310 * unit.kelvin,1.0 / unit.picosecond,1 * unit.femtosecond)
sim = app.Simulation(f.getTopology(), system, integrator)
sim.context.setPositions(f.getPositions())
state = sim.context.getState(getEnergy=True)
state.getPotentialEnergy()
prints Quantity(value=547.6336059570312, unit=kilojoule/mole)
Other quantities are also incorrect, so presumably my file is still being read in incorrectly? Or was a different convention used between openmm and molly?
It seems okay, the equivalent OpenMM call would be:
import openmm
from openmm import app, unit
forcefield = app.ForceField('amber/ff14SB.xml')
f = app.PDBFile('big_alanine.pdb')
system = forcefield.createSystem(
f.getTopology(),
nonbondedMethod=app.CutoffPeriodic,
nonbondedCutoff=1*unit.nanometer,
constraints=None,
rigidWater=False,
switchDistance=None,
useDispersionCorrection=False,
)
integrator = openmm.LangevinMiddleIntegrator(310 * unit.kelvin,1.0 / unit.picosecond,1 * unit.femtosecond)
sim = app.Simulation(f.getTopology(), system, integrator)
sim.context.setPositions(f.getPositions())
state = sim.context.getState(getEnergy=True)
state.getPotentialEnergy()
Quantity(value=1128.4508056640625, unit=kilojoule/mole)
Which is much better, there is still a 3E-4 kJ/mol discrepancy.
Note you need a line like
CRYST1 100.000 100.000 100.000 90.00 90.00 90.00 P 1 1
in the PDB file for CutoffPeriodic
to work.
that clarifies some stuff! I'm really struggling with basic things but from testing it looks like molly is at least a factor 10 faster than openmm (for my admittedly weird use case).
Cool, I'd be interested to hear roughly what you are doing if you are able to share, we might be able to add more features and docs to help.
Btw make sure you are on Molly.jl v0.18.2 or later as that version had a significant performance improvement for periodic boundary conditions.
The plan is to wrap it up in a paper (though the draft is progressing slowly), so molly will definitely be cited! We really only need a way to evaluate energies and manipulate positions. Molly is nice because unlike pythonic packages that bind to optimized C code, I can directly profile and manipulate the system structure.
At the moment I'm only interested in all-to-all, but there are two other projects I want to try molly on, one which requires mixing periodic boundary conditions in one direction with open boundary conditions in another, and another where I'll be using conventional periodic boundary conditions. Thanks a lot for the hard work that went into the package! It was bit rough to get started simulating my first molecule, but after that it's really only been smooth sailing!