yank icon indicating copy to clipboard operation
yank copied to clipboard

Questions about convergence and sampler

Open jslim-furame opened this issue 4 years ago • 7 comments

I installed the latest yank (0.25.2) and tested it using the scripts given in t4-lysozyme example directory.

  1. The first question is: When I increase "default_number_of_iterations" from 500 to 2000, the binding affinity does not converge to a certain value, rather it fluctuates (for example, between -6 and -4.5 kcal/mol). According to my experience with Monte Carlo, it does not make any sense. Why?

  2. The second question is: When I change the sampler from ReplicaExchange to SAMS, their results are very different, giving about 8~9 kcal/mol differences. What is happening with the change of the sampler?

Thanks in advance.

jslim-furame avatar Dec 06 '19 03:12 jslim-furame

Thanks for posting this!

2000 iterations (presuming 1 ps/iteration) is still very short for an absolute free energy calculation with YANK, and short simulations could have unreliable estimates of the uncertainty, leading to discrepancies between your simulations that appear to be statistically significant but are not. We are working on coming up with better estimates of the statistical error to properly reflect the true uncertainties, but it's difficult to do so if the simulations are so short they don't sample slow timescale ligand (and sometimes protein) motions. For now, I'd suggest much longer simulations (20,000 iterations or more).

SAMS is still highly experimental, and currently only converges reliably under ideal conditions when there is a second-order-like (continuous) phase transition in the replicas. Please use it with caution! We're working on more robust ways of using SAMS.

jchodera avatar Dec 06 '19 04:12 jchodera

Thanks for your reply.

You mean, even if I employ the GBSA (implicit water) model, i have to run the simulation during 20,000 (20 ns) or more. Right?

jslim-furame avatar Dec 06 '19 04:12 jslim-furame

It depends on the precise nature of the restraints you are using. Are you using the p-xylene-implciit example?

jchodera avatar Dec 06 '19 04:12 jchodera

Yes. But I am using Boresch restraint.

By the way, in this tutorial http://www.alchemistry.org/wiki/Absolute_Binding_Free_Energy_-_Gromacs_2016, the author run the simulations during 1 ns/replica for complex system and 5 ns/replica for ligand only system. What is the benefit of yank compared with this pure MD simulations?

jslim-furame avatar Dec 06 '19 04:12 jslim-furame

Hm, properly selected Boresch restraints should converge much faster than for Harmonic restraints (which is what I thought you were using). Can you share your input files in a ZIP file and we can look at what's going on here?

jchodera avatar Dec 06 '19 15:12 jchodera

example.zip

Please find attached my input files.

As an input script for yank, I exactly used the t4-lysozyme implicit.yaml example except for employing Boresch restraint. To compare its result with that given in http://www.alchemistry.org/wiki/Absolute_Binding_Free_Energy_-_Gromacs_2016, I processed "2rbn" PDB structure to prepare protein and ligand. (In doing so, pdb4amber and antechamber were used)

I know the generalized Born surface area (GBSA) model is just an approximation and hence the result can be different. My intention is not to get an exact value, but, to get a converged result.

Thanks in advance.

jslim-furame avatar Dec 07 '19 04:12 jslim-furame

Hi @jslim-furame. An oscillating free energy trajectory is a symptom that your sampling (whether with MD or MC) is still mixing.

With Boresch restraints, I'd suggest you to specify manually the atoms to be restrained as the mixing time will greatly depend on the selection. See the docs for more info.

When I change the sampler from ReplicaExchange to SAMS, their results are very different, giving about 8~9 kcal/mol differences. What is happening with the change of the sampler?

SAMS is a single-replica method so for the same number of iterations you have n_replicas fewer samples than HREX to estimate the free energy. You'll have to crank up the number of iterations in SAMS to obtain the same number of samples. For more info, this docs page should give you an overview and also give you a sense as to why YANK doesn't run just simple MD by default.

Hope this helps.

andrrizzi avatar Dec 08 '19 16:12 andrrizzi