BioSimSpace icon indicating copy to clipboard operation
BioSimSpace copied to clipboard

[TUTORIAL] Funnel metadynamics

Open lohedges opened this issue 3 years ago • 96 comments

This is a thread to discuss the creation of a tutorial showing how to implement funnel metadynamics within BioSimSpace.

lohedges avatar Mar 02 '21 16:03 lohedges

Adding current diagram summarising the use case:

funnel_md-white

jmichel80 avatar Mar 03 '21 21:03 jmichel80

I've ran a fun-metaD sim that was setup for gromacs and I got a few errors.

  1. Missing definition for the ligand center of mass, after the definition of p1 and p2 points. lig: COM ATOMS=first_ligand_atom_id-last_ligand_atom_id

  2. Missing sigma parameter for the extent CV, so instead of metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025 ...

    it should be

    metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025,0.05 ...

dlukauskis avatar Mar 22 '21 13:03 dlukauskis

Thanks for catching these. I'll be able to push a fix tomorrow.

On Mon, 22 Mar 2021, 13:02 dlukauskis, @.***> wrote:

I've ran a fun-metaD sim that was setup for gromacs and I got a few errors.

Missing definition for the ligand center of mass, after the definition of p1 and p2 points. lig: COM ATOMS=first_ligand_atom_id-last_ligand_atom_id 2.

Missing sigma parameter for the extent CV, so instead of metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025 ...

it should be

metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025,0.05 ...

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/michellab/BioSimSpace/issues/194#issuecomment-804043374, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6K3IIUF5ATUZHWHMVRDDTE45W3ANCNFSM4YPIREAQ .

lohedges avatar Mar 22 '21 13:03 lohedges

I've made some functions that draw the funnel for NGLview, here's the code. The idea behind using NGLview is to rapidly check that the funnel points out the right way and includes/excludes the right bits of the protein. @lohedges, could you have a go at making my functions built into BSS code?

dlukauskis avatar Mar 22 '21 19:03 dlukauskis

Thanks, @dlukauskis. I've fixed the issues with the GROMACS PLUMED file and have added a check so that the ProjectionOnAxis.cpp file is only copied to the working directory when the PLUMED version is less than 2.7.

The plotting functionality looks great. I'll start implementing this now. Just to check... you define three vectors: origin, v1, and v2. Both origin and v1 are identical in this case. Which system did you use to define the values of v1 and v2? Are these just the COM of p0 and p1 used to define the funnel CV? (From reading the function doc strings, it would seem so.) I see you include a solvated.pdb file. Does this correspond to one of the systems from your original tutorial?

lohedges avatar Mar 23 '21 10:03 lohedges

Oh yeah, origin = v1 = p1 and v2 = p2. This was based on some older bit of code so I forgot to remove all of the redundancies like this. Solvated.pdb is 2WI3, which is almost identical to 2WI2 from the tutorial, except for a slightly different binding pose.

dlukauskis avatar Mar 23 '21 12:03 dlukauskis

I've now implemented the visualisation code natively in BioSimSpace. You can do the following to generate a BioSimSpace.Notebook.View object, e.g. using files from your tutorial:

import BioSimSpace as BSS

# Load the system.
system = BSS.IO.readMolecules(["input/2WI2.prmtop", "input/2WI2.rst7"])

# Create the funnel parameters.
p0, p1 = BSS.Metadynamics.CollectiveVariable.makeFunnel(system)

# Define the collective variable.
cv = BSS.Metadynamics.CollectiveVariable.Funnel(p0, p1)

# Create a view.
view = BSS.Metadynamics.CollectiveVariable.viewFunnel(system, cv)

This can then be used to visualise the funnel.

View the entire solvated system and funnel:

view.system()

View the protein, ligand, and funnel: (Assuming the protein and ligand are the first two molecules)

view.molecules([0, 1, -1])

Just view the funnel:

view.molecule(-1)

Let me know if I've missed anything, or if you come across any issues.

lohedges avatar Mar 23 '21 14:03 lohedges

Yep, that visualisation code works really well! I noticed one issue with the PLUMED file still, where line 473 in _plumed.py colvar_string = "lig: COM=%d-%d" % (idx+1, idx+num_atoms) should be colvar_string = "lig: COM ATOMS=%d-%d" % (idx+1, idx+num_atoms)

I'm also attaching a metadynamics.py for OpenMM 7.4.2 that allows to print hill heights. metadynamics.zip

dlukauskis avatar Apr 12 '21 12:04 dlukauskis

Thanks, I'll try to figure out how we can monkey-patch the version of OpenMM that we bundle (either in the binary or from conda). It looks like nothing has changed in the metadynamics code for OpenMM 7.5, so this should work for the conda-forge build of OpenMM too.

lohedges avatar Apr 12 '21 13:04 lohedges

This is a slightly better version of metadynamics.py that I sent yesterday, removed some lines that were in the older version for testing and debugging. Also, checkout the changes I've made to the typical OpenMM run.py that makes it write a HILLS file. run.zip

dlukauskis avatar Apr 13 '21 13:04 dlukauskis

Fantastic, thanks for these. I'll try to incorporate the changes tomorrow morning and report on progress during the meeting. For simplicity I'll probably bundle your modified metadynamics script and copy it to the working directory at run-time, e.g. like we do for the additional PLUMED file.

lohedges avatar Apr 13 '21 14:04 lohedges

Okay, I think I've updated our OpenMM metadynamics driver to add the implementation for your PLUMED compatible HILLS file. Could you take a look at the openmm.py script that is generated to see if it looks okay?

When you get a chance, could you upload some output from a simulation (COLVAR and HILLS files) so that I can check that it works with our current PLUMED analysis wrappers. Once that works, I'll expose those to the OpenMM process object so the user can perform on-the-fly analysis of the funnel metadynamics simulation.

lohedges avatar Apr 14 '21 09:04 lohedges

We might also need to write the OpenMM colvar to a text file, but that should be easy enough to do, or check for the COLVAR.npy file and convert it within the PLUMED code (or just use the NumPy binary directly).

lohedges avatar Apr 14 '21 09:04 lohedges

Okay, I think I've updated our OpenMM metadynamics driver to add the implementation for your PLUMED compatible HILLS file. Could you take a look at the openmm.py script that is generated to see if it looks okay?

Sure.

When you get a chance, could you upload some output from a simulation (COLVAR and HILLS files) so that I can check that it works with our current PLUMED analysis wrappers. Once that works, I'll expose those to the OpenMM process object so the user can perform on-the-fly analysis of the funnel metadynamics simulation.

The COLVAR.npy only contains the CV values, proj and ext, and the hill height in rows 0 and 1 and 2, respectively. It doesn't contain all the same info that a COLVAR file produced with PLUMED would. HILLS file written by my code is formatted to be identical to HILLS made by PLUMED. I've tested getting 2D and 1D FES from my HILLS by feeding that file into PLUMED and it spits out FES that are identical to ones made by OpenMM. Here's the zipped HILLS file.

dlukauskis avatar Apr 14 '21 12:04 dlukauskis

Thanks, I'm glad that the FES agree with OpenMM :+1: I'll run the HILLS file through our own sum_hills wrapper when I get a chance to make sure all is okay.

The COLVAR.npy only contains the CV values, proj and ext, and the hill height in rows 0 and 1 and 2, respectively. It doesn't contain all the same info that a COLVAR file produced with PLUMED would.

That's okay. I only use the file for getting instantaneous and time series values of the collective variables, with some tricks to make sure that I extract the correct variables based on their names in the original PLUMED file. I should just be able to use the information in your file for the same purposes. I could just use the HILLS file too, although in general (using PLUMED) I write to the COLVAR more frequently than HILLS.

lohedges avatar Apr 14 '21 13:04 lohedges

I'm trying to use OpenMM to minimize and equilibrate as part of my fun-metaD tutorial. I've ran into an issue with minimization using OpenMM, just giving me a generic simtk.openmm.OpenMMException: Particle coordinate is nan Then I checked the RST7 and PRM7 files and they look really off when I visualise them using VMD. The protein and the ligand bonds are fine, but the solvent seems broken. This might be an issue with read/write of the files. Here are the inputs.

dlukauskis avatar Apr 16 '21 12:04 dlukauskis

Was the system completely prepared in BIoSimSpace, or was this a fully solvated system that you loaded in initially? If the latter, could you upload those files too. We do some internal conversions to make sure that waters are formatted correctly for different MD engines, so something might have gone wrong here? (I've tested that I get the same doing round-trip conversions this way, though.)

lohedges avatar Apr 16 '21 13:04 lohedges

Here are all the inputs and the script that I used to set up and run the minimization. I started from a protein PDB with some water I discard and a ligand MOL2.

dlukauskis avatar Apr 16 '21 13:04 dlukauskis

Running the minimisation with GROMACS (using the solvated system directly) works so something must be going wrong with the conversion to AMBER, which is what is used for OpenMM. I'll see if preventing the water model conversion makes any difference.

lohedges avatar Apr 16 '21 13:04 lohedges

Actually, we don't convert the water topology when running with OpenMM, only with AMBER or GROMACS. This means that OpenMM is using AMBER format files with a GROMACS water topology. If I explicitly convert the topology then it still fails, i.e. doing the following before creating the process.

solvated._set_water_topology("AMBER")

I'll try using the GROMACS files directly with OpenMM to see whether that works.

I'm not sure of the issue with this system, since this is the basic setup for pretty much all of our simulations and I haven't come across many blowups like this.

lohedges avatar Apr 16 '21 13:04 lohedges

I think it's an issue with the protein. If I just solvate the protein, I see the same blow up. If I just solvate the ligand, then it works.

lohedges avatar Apr 16 '21 13:04 lohedges

For reference, the minimisation crashes if you just use the protein in vacuum, so I don't think it's anything specifically to do with the water molecules in the system.

lohedges avatar Apr 16 '21 17:04 lohedges

Running the minimisation with GROMACS (using the solvated system directly) works so something must be going wrong with the conversion to AMBER, which is what is used for OpenMM. I'll see if preventing the water model conversion makes any difference.

For reference, GROMACS doesn't immediately crash, but the energies that it reports do look somewhat suspect.

lohedges avatar Apr 16 '21 20:04 lohedges

I've abandoned the protein PDB that was causing the issues and chose another one. Minimisation works fine now. I caught a bug that occurs when you try to run equilibration with CUDA as a platform, see my pull request.

I now get a strange issue when using process.getSystem() simply doesn't work. I'm attaching a zip with input files and the script.

dlukauskis avatar Apr 17 '21 15:04 dlukauskis

I'll try to test this later. I see that there are multiple process.getSystem() calls in your script. Are they all failing, or just the final one for equilibration? I see that you have files corresponding to the minimised system in the directory, so assume that the minimisation ran file. It might not be returning anything since the equilibration failed, so you might want to wrap the calls to process.getSystem() inside of a block that checks to see if the process raised an error, e.g.:

if not process.isError():
    equilibrated = process.getSystem()
else:
    print(process.stdout(50))
    raise Exception("The process failed!")

(It should warn you if an error occured when calling getSystem though.)

Another possiblilty is that the trajectory frame indexing is incorrect so it can't find the last frame when calling getSystem, so returns nothing instead. I'll test this with a short simulation.

lohedges avatar Apr 17 '21 16:04 lohedges

Just tested a shorter CPU simulation of 0.01 nanoseconds and it works for me, i.e. I don't get any errors and there are equilibrated.* files in my working directory. Could you check the version that you are using since I believe that I did make modification to the way that I call getSystem for the OpenMM driver. Since it reads a trajectory to get the most recent frame, I've made tweaks so that it doesn't need to load the whole trajectory into memory.

I'll try a longer simulation in case MDTraj/MDAnalysis are simply failing when the trajectory is too large. If this is the problem, then you might want to adjust the report_interval and restart_interval that are passed to the protocol constructor. By default we report quite frequently, so you might want to adjust these for a longer simulation.

lohedges avatar Apr 17 '21 17:04 lohedges

I've abandoned the protein PDB that was causing the issues and chose another one.

Yes, this is rather weird. Trust us to choose a system that misbehaves!

I did a bit more testing and the protein in vacuum works with AMBER and GROMACS, but not OpenMM. If I actually take the files generated by tLEaP and use those with OpenMM directly, i.e. not loading via BioSimSpace and writing back out, then it fails with the same error. As such, I don't think it's an issue with the way that we're parsing the file, rather something peculiar about this particular system with OpenMM. Perhaps it is more sensitive to the accuracy of the initial coordinates for some reason, or its own parser is misinterpreting things?

The protein and the ligand bonds are fine, but the solvent seems broken.

I think this because these are AMBER format files containing a GROMACS format water topology, i.e. naming conventions, etc. We don't care about the naming internally (neither does GROMACS or OpenMM) but some third-party tools do, and there is a difference between the way that fast three-point waters are represented. I imagine VMD is expecting AMBER format waters when you load an AMBER topology file.

Since we're using AMBER format files with OpenMM, I've updated the code so that it converts the water topology before write to keep things consistent. (These edits are in a feature branch that I'll merge across on Monday. It shouldn't affect the current simulations, though.)

I've also tested that process.getSystem() returns correctly from the minimisation stage and all seems okay at my end. (They way getSystem works is slightly different depending on the protocol, owing to the different files that are written.)

lohedges avatar Apr 17 '21 20:04 lohedges

Minimisation process.getSystem() worked fine for me as well. I tried running the equilibration for 0.01 ns and using the CPU and it did indeed work. Then I tried the 0.01 ns with CUDA and that worked too. As soon as I switch to 0.1 ns with either CPU or CUDA, I get system return error I mentioned. If I set report_interval=1000, restart_interval=1000, the problem persists despite the trajectory having only 50 frames.

dlukauskis avatar Apr 17 '21 20:04 dlukauskis

I wonder if there's a rounding issue with the way I compute the fraction of the trajectory that has completed in order to work out the current frame when you run getSystem. Just checking now. Will report back when I know more.

lohedges avatar Apr 17 '21 21:04 lohedges

Yes, I think that's the issue. It is caught as an exception in getFrame, but in getSystem I use a try/except block since you don't want it to fail when running interactively if nothing has been written to the trajectory file yet. I'll just cap the frac_complete at 1 in getSystem so that we never ask for a frame outside of the limits.

(OpenMM is reporting a time of slightly greater than num_steps * timestep for the last step.)

lohedges avatar Apr 17 '21 21:04 lohedges