openmm
openmm copied to clipboard
Energy minimization for protein-ligand modeling
Dear all,
I am trying to create a pipeline to automate the binding energy calculation of several variants of a homotetrameric protein in complex with a medium/big-sized ligand (CoA). I haven't decided yet which tool I will use for the actual calculation of the binding energy, but I know that I need to perform some sort of energy minimization and maybe also relaxation using MD after homology modeling in order to properly model the complex before the binding energy calculations. My question is whether simulation.minimizeEnergy() from OpenMM can handle this task. Additionally, I would like to know how I can output the resulting structure and make a plot to check for minimization convergence.
I would appreciate any help you can provide.
Best regards,
Gustavo
minimizeEnergy() performs a local energy minimization. It's generally used to eliminate large forces before starting a simulation. A protein has an enormously complicated potential energy surface with countless local minima. Energy minimization will find one of those local minima, but there's no guarantees about which one it will find.
What method do you plan to use for calculating binding energies? A typical simulation protocol involves a series of steps along the lines of 1) create an initial conformation (for example by homology modeling), 2) energy minimize to eliminate any large forces, 3) simulate for a while to equilibrate, 4) run more simulation, collecting data from which you will calculate results.
@GuhLelouch "relaxation using MD after homology modeling in order to properly model the complex" - The short answer is no, simulation.minimizeEnergy() won't do what that; it will take you from the starting coordinates to coordinates to near a local minimum (near in the sense that the gradient is below a specified gradient tolerance), but that's about it. That usually allows starting a simulation with a normal step size without it immediately blowing up due to too-high forces.
"Relaxation" likely implies some type of non-local minimization/annealing/etc that involves more than going to the nearest local minimum. You might try Rosetta FastRelax or something along those lines. You might also try doing your homology modelling with AlphaFold/ColabFold which gives pretty good starting structures; I think those are good enough that they wouldn't benefit much from a FastRelax pass. In general, starting from a "bad" structure and taking it to a "good" structure is not much easier than protein structure prediction.
@peastman Answering your question: I aim to use any fast method that can be easily incorporated into a Python script. I don't care much about creating the best method at the moment; I just want to be able to obtain these energies to assist in my experimental work. That said, I was thinking about using OpenMM to run short MD simulations of the proteins in the free and bound states and then use MM-PBSA/GBSA to calculate the binding energies (not sure if OpenMM can perform these calculations). Alternatively, I could try using one of these recent machine learning-aided methods, such as RASPD+.
@aizvorski As I want to make something more automated in Python to screen a medium number of mutants, I am not considering ColabFold. As for AlphaFold, I have successfully run it on my workplace's high-performance computer, but I want to avoid using it because (1) the model has to be a tetramer, (2) the modeling has to take into account the ligand, (3) currently I don't have a lot of time to learn how to adapt and optimize my code to the cluster environment. FastRelax sounds interesting, but I currently only have a laptop with Windows on it. What I am trying to do is to perform the homology modeling using Modeller with restraints for the tetramer and ligand, and then possibly run Autodock Vina afterward with some restraints on the ligand and flexibility on the protein. Optimization of the model using FastRelax or something else would come after. However, I suppose that if want to run MD simulations for the binding energy calculations, I may not need this last step.
Thank you both for your answers. They are helping to clarify some details of my pipeline!
When you say "binding energy", I assume you really mean free energy? Free energy prediction is a whole field in itself with many different methods available. There are people here who are experts in that field, so I'll let someone else answer the question.
@peastman Yes. Just to clarify: I mean free energy of binding.
@GuhLelouch I have written a protein-ligand free energy of binding pipeline centered around OpenMM in my academic research that is still running, and we choose to follow a similar protocol to Schrödinger's FEP+ which is a form of metadynamics which runs about 1 Ligand / 13 hours. Regardless of protocol specifics, a lot of our challenges were solved by just doubling down on learning how to align, translate, and mutate molecules in code and how to interoperate between Parmed, OpenMM, RDKit, OpenMMForceField, and MDAnalysis to mainline everything. If I understand correctly, you're looking to determine something along the lines of amino acid mutations on binding affinity. If you have a starting PDB already, I would generally caution going to AlphaFold since the next step is always to equilibrate the protein in your simulation environment, which will jostle around the binding pocket anyhow and it cuts down on protocol time. Test it for your case and see if its necessary, but in our case, it didn't make any difference whether we used an AF structure or PDB as within 10ns both converged to very similar equilibrated structure when solvated. Mutating a residue is begrudgingly going to be easiest in Rosetta, but Chimera can do it as well. There is also code floating around Github for aligning and translating small molecules (amino acids in this case) in RDKit and then they just need their bonds and hydrogens adjusted. PyRosetta or OpenMM could handle the minimization at that point since the model is in memory. In terms of the free energy of binding, how fast do you need the calculation and to which degree of accuracy are you willing to accept? Ensembles and Markov State Modeling -> MMPBSA or Linear Interaction Energy on the faster side. Metadynamics/ Hamiltonian Replica Exchange/ Umbrella Sampling -> Reweighing are more compute intensive solutions but provides greater in-depth time series analysis