BioSimSpace
BioSimSpace copied to clipboard
[TUTORIAL] Funnel metadynamics
This is a thread to discuss the creation of a tutorial showing how to implement funnel metadynamics within BioSimSpace.
Adding current diagram summarising the use case:
I've ran a fun-metaD sim that was setup for gromacs and I got a few errors.
-
Missing definition for the ligand center of mass, after the definition of p1 and p2 points.
lig: COM ATOMS=first_ligand_atom_id-last_ligand_atom_id
-
Missing sigma parameter for the extent CV, so instead of
metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025 ...
it should be
metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025,0.05 ...
Thanks for catching these. I'll be able to push a fix tomorrow.
On Mon, 22 Mar 2021, 13:02 dlukauskis, @.***> wrote:
I've ran a fun-metaD sim that was setup for gromacs and I got a few errors.
Missing definition for the ligand center of mass, after the definition of p1 and p2 points. lig: COM ATOMS=first_ligand_atom_id-last_ligand_atom_id 2.
Missing sigma parameter for the extent CV, so instead of metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025 ...
it should be
metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025,0.05 ...
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/michellab/BioSimSpace/issues/194#issuecomment-804043374, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6K3IIUF5ATUZHWHMVRDDTE45W3ANCNFSM4YPIREAQ .
I've made some functions that draw the funnel for NGLview, here's the code. The idea behind using NGLview is to rapidly check that the funnel points out the right way and includes/excludes the right bits of the protein. @lohedges, could you have a go at making my functions built into BSS code?
Thanks, @dlukauskis. I've fixed the issues with the GROMACS PLUMED file and have added a check so that the ProjectionOnAxis.cpp file is only copied to the working directory when the PLUMED version is less than 2.7.
The plotting functionality looks great. I'll start implementing this now. Just to check... you define three vectors: origin
, v1
, and v2
. Both origin
and v1
are identical in this case. Which system did you use to define the values of v1
and v2
? Are these just the COM of p0
and p1
used to define the funnel CV? (From reading the function doc strings, it would seem so.) I see you include a solvated.pdb
file. Does this correspond to one of the systems from your original tutorial?
Oh yeah, origin = v1 = p1 and v2 = p2. This was based on some older bit of code so I forgot to remove all of the redundancies like this. Solvated.pdb is 2WI3, which is almost identical to 2WI2 from the tutorial, except for a slightly different binding pose.
I've now implemented the visualisation code natively in BioSimSpace. You can do the following to generate a BioSimSpace.Notebook.View
object, e.g. using files from your tutorial:
import BioSimSpace as BSS
# Load the system.
system = BSS.IO.readMolecules(["input/2WI2.prmtop", "input/2WI2.rst7"])
# Create the funnel parameters.
p0, p1 = BSS.Metadynamics.CollectiveVariable.makeFunnel(system)
# Define the collective variable.
cv = BSS.Metadynamics.CollectiveVariable.Funnel(p0, p1)
# Create a view.
view = BSS.Metadynamics.CollectiveVariable.viewFunnel(system, cv)
This can then be used to visualise the funnel.
View the entire solvated system and funnel:
view.system()
View the protein, ligand, and funnel: (Assuming the protein and ligand are the first two molecules)
view.molecules([0, 1, -1])
Just view the funnel:
view.molecule(-1)
Let me know if I've missed anything, or if you come across any issues.
Yep, that visualisation code works really well! I noticed one issue with the PLUMED file still, where line 473 in _plumed.py
colvar_string = "lig: COM=%d-%d" % (idx+1, idx+num_atoms)
should be
colvar_string = "lig: COM ATOMS=%d-%d" % (idx+1, idx+num_atoms)
I'm also attaching a metadynamics.py
for OpenMM 7.4.2 that allows to print hill heights.
metadynamics.zip
Thanks, I'll try to figure out how we can monkey-patch the version of OpenMM that we bundle (either in the binary or from conda). It looks like nothing has changed in the metadynamics code for OpenMM 7.5, so this should work for the conda-forge build of OpenMM too.
This is a slightly better version of metadynamics.py that I sent yesterday, removed some lines that were in the older version for testing and debugging. Also, checkout the changes I've made to the typical OpenMM run.py
that makes it write a HILLS file.
run.zip
Fantastic, thanks for these. I'll try to incorporate the changes tomorrow morning and report on progress during the meeting. For simplicity I'll probably bundle your modified metadynamics script and copy it to the working directory at run-time, e.g. like we do for the additional PLUMED file.
Okay, I think I've updated our OpenMM metadynamics driver to add the implementation for your PLUMED compatible HILLS file. Could you take a look at the openmm.py
script that is generated to see if it looks okay?
When you get a chance, could you upload some output from a simulation (COLVAR and HILLS files) so that I can check that it works with our current PLUMED analysis wrappers. Once that works, I'll expose those to the OpenMM process object so the user can perform on-the-fly analysis of the funnel metadynamics simulation.
We might also need to write the OpenMM colvar to a text file, but that should be easy enough to do, or check for the COLVAR.npy
file and convert it within the PLUMED code (or just use the NumPy binary directly).
Okay, I think I've updated our OpenMM metadynamics driver to add the implementation for your PLUMED compatible HILLS file. Could you take a look at the
openmm.py
script that is generated to see if it looks okay?
Sure.
When you get a chance, could you upload some output from a simulation (COLVAR and HILLS files) so that I can check that it works with our current PLUMED analysis wrappers. Once that works, I'll expose those to the OpenMM process object so the user can perform on-the-fly analysis of the funnel metadynamics simulation.
The COLVAR.npy
only contains the CV values, proj and ext, and the hill height in rows 0 and 1 and 2, respectively. It doesn't contain all the same info that a COLVAR file produced with PLUMED would. HILLS
file written by my code is formatted to be identical to HILLS
made by PLUMED. I've tested getting 2D and 1D FES from my HILLS
by feeding that file into PLUMED and it spits out FES that are identical to ones made by OpenMM. Here's the zipped HILLS
file.
Thanks, I'm glad that the FES agree with OpenMM :+1: I'll run the HILLS file through our own sum_hills
wrapper when I get a chance to make sure all is okay.
The COLVAR.npy only contains the CV values, proj and ext, and the hill height in rows 0 and 1 and 2, respectively. It doesn't contain all the same info that a COLVAR file produced with PLUMED would.
That's okay. I only use the file for getting instantaneous and time series values of the collective variables, with some tricks to make sure that I extract the correct variables based on their names in the original PLUMED file. I should just be able to use the information in your file for the same purposes. I could just use the HILLS file too, although in general (using PLUMED) I write to the COLVAR more frequently than HILLS.
I'm trying to use OpenMM to minimize and equilibrate as part of my fun-metaD tutorial. I've ran into an issue with minimization using OpenMM, just giving me a generic
simtk.openmm.OpenMMException: Particle coordinate is nan
Then I checked the RST7 and PRM7 files and they look really off when I visualise them using VMD. The protein and the ligand bonds are fine, but the solvent seems broken. This might be an issue with read/write of the files. Here are the inputs.
Was the system completely prepared in BIoSimSpace, or was this a fully solvated system that you loaded in initially? If the latter, could you upload those files too. We do some internal conversions to make sure that waters are formatted correctly for different MD engines, so something might have gone wrong here? (I've tested that I get the same doing round-trip conversions this way, though.)
Here are all the inputs and the script that I used to set up and run the minimization. I started from a protein PDB with some water I discard and a ligand MOL2.
Running the minimisation with GROMACS (using the solvated system directly) works so something must be going wrong with the conversion to AMBER, which is what is used for OpenMM. I'll see if preventing the water model conversion makes any difference.
Actually, we don't convert the water topology when running with OpenMM, only with AMBER or GROMACS. This means that OpenMM is using AMBER format files with a GROMACS water topology. If I explicitly convert the topology then it still fails, i.e. doing the following before creating the process.
solvated._set_water_topology("AMBER")
I'll try using the GROMACS files directly with OpenMM to see whether that works.
I'm not sure of the issue with this system, since this is the basic setup for pretty much all of our simulations and I haven't come across many blowups like this.
I think it's an issue with the protein. If I just solvate the protein, I see the same blow up. If I just solvate the ligand, then it works.
For reference, the minimisation crashes if you just use the protein in vacuum, so I don't think it's anything specifically to do with the water molecules in the system.
Running the minimisation with GROMACS (using the solvated system directly) works so something must be going wrong with the conversion to AMBER, which is what is used for OpenMM. I'll see if preventing the water model conversion makes any difference.
For reference, GROMACS doesn't immediately crash, but the energies that it reports do look somewhat suspect.
I've abandoned the protein PDB that was causing the issues and chose another one. Minimisation works fine now. I caught a bug that occurs when you try to run equilibration with CUDA as a platform, see my pull request.
I now get a strange issue when using process.getSystem()
simply doesn't work. I'm attaching a zip with input files and the script.
I'll try to test this later. I see that there are multiple process.getSystem()
calls in your script. Are they all failing, or just the final one for equilibration? I see that you have files corresponding to the minimised system in the directory, so assume that the minimisation ran file. It might not be returning anything since the equilibration failed, so you might want to wrap the calls to process.getSystem()
inside of a block that checks to see if the process raised an error, e.g.:
if not process.isError():
equilibrated = process.getSystem()
else:
print(process.stdout(50))
raise Exception("The process failed!")
(It should warn you if an error occured when calling getSystem
though.)
Another possiblilty is that the trajectory frame indexing is incorrect so it can't find the last frame when calling getSystem
, so returns nothing instead. I'll test this with a short simulation.
Just tested a shorter CPU simulation of 0.01 nanoseconds and it works for me, i.e. I don't get any errors and there are equilibrated.*
files in my working directory. Could you check the version that you are using since I believe that I did make modification to the way that I call getSystem
for the OpenMM driver. Since it reads a trajectory to get the most recent frame, I've made tweaks so that it doesn't need to load the whole trajectory into memory.
I'll try a longer simulation in case MDTraj/MDAnalysis are simply failing when the trajectory is too large. If this is the problem, then you might want to adjust the report_interval
and restart_interval
that are passed to the protocol constructor. By default we report quite frequently, so you might want to adjust these for a longer simulation.
I've abandoned the protein PDB that was causing the issues and chose another one.
Yes, this is rather weird. Trust us to choose a system that misbehaves!
I did a bit more testing and the protein in vacuum works with AMBER and GROMACS, but not OpenMM. If I actually take the files generated by tLEaP
and use those with OpenMM directly, i.e. not loading via BioSimSpace and writing back out, then it fails with the same error. As such, I don't think it's an issue with the way that we're parsing the file, rather something peculiar about this particular system with OpenMM. Perhaps it is more sensitive to the accuracy of the initial coordinates for some reason, or its own parser is misinterpreting things?
The protein and the ligand bonds are fine, but the solvent seems broken.
I think this because these are AMBER format files containing a GROMACS format water topology, i.e. naming conventions, etc. We don't care about the naming internally (neither does GROMACS or OpenMM) but some third-party tools do, and there is a difference between the way that fast three-point waters are represented. I imagine VMD is expecting AMBER format waters when you load an AMBER topology file.
Since we're using AMBER format files with OpenMM, I've updated the code so that it converts the water topology before write to keep things consistent. (These edits are in a feature branch that I'll merge across on Monday. It shouldn't affect the current simulations, though.)
I've also tested that process.getSystem()
returns correctly from the minimisation stage and all seems okay at my end. (They way getSystem
works is slightly different depending on the protocol, owing to the different files that are written.)
Minimisation process.getSystem()
worked fine for me as well. I tried running the equilibration for 0.01 ns and using the CPU and it did indeed work. Then I tried the 0.01 ns with CUDA and that worked too. As soon as I switch to 0.1 ns with either CPU or CUDA, I get system return error I mentioned. If I set report_interval=1000, restart_interval=1000
, the problem persists despite the trajectory having only 50 frames.
I wonder if there's a rounding issue with the way I compute the fraction of the trajectory that has completed in order to work out the current frame when you run getSystem
. Just checking now. Will report back when I know more.
Yes, I think that's the issue. It is caught as an exception in getFrame
, but in getSystem
I use a try/except
block since you don't want it to fail when running interactively if nothing has been written to the trajectory file yet. I'll just cap the frac_complete
at 1 in getSystem
so that we never ask for a frame outside of the limits.
(OpenMM is reporting a time of slightly greater than num_steps * timestep
for the last step.)