alchemlyb icon indicating copy to clipboard operation
alchemlyb copied to clipboard

add R_c (convergence fraction) convergence analysis

Open orbeckst opened this issue 4 years ago • 10 comments

Add the analysis of converging fraction R_c from the paper

S. Fan, B. I. Iorga, and O. Beckstein. Prediction of octanol-water partition coefficients for the SAMPL6- log P molecules using molecular dynamics simulations with OPLS-AA, AMBER and CHARMM force fields. Journal of Computer-Aided Molecular Design, 34:543–560, 2020. https://doi.org/10.1007/s10822-019-00267-z

cc @VOD555

orbeckst avatar Jul 16 '20 23:07 orbeckst

For other convergence analysis see http://getyank.org/latest/algorithms.html

orbeckst avatar Jun 10 '21 21:06 orbeckst

@orbeckst Thanks for this issue. I'm interested in this way of checking convergence as well. I wonder if you have any code snippet or library that we could use?

xiki-tempula avatar Aug 14 '22 19:08 xiki-tempula

convergence.zip @xiki-tempula Here's the script we used in SAMPL6 and SAMPL7 challenges.

VOD555 avatar Aug 15 '22 16:08 VOD555

@VOD555 Thank you for the script. I have a question with regard to the paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8397498/#FD17

I wonder how do you compute the equation 17 and 18?

Is the Ac actually being used? Thank you.

xiki-tempula avatar Aug 17 '22 14:08 xiki-tempula

Quick comment (@VOD555 should be able to provide more details):

  1. In order to compute the cumulative prob. distribution C(Rc) (eq 17), you take the Rc for each lambda window (let's call them Rc_lambda, and then count for a range of values 0 ≤Rc ≤ 1 (e.g. evenly spaced), how many Rc_lambda ≤ Rc. Divide all the counts by the total number of lambda windows. (There might be a smart way to do this with cumulative sums and @VOD555 should be able to point you to actual code.) You can also create a function from heaviside step functions, something like
    def C_Rc_func_factory(Rc_lambdas):
         def f(Rc):
              return np.sum([np.heaviside(Rc - Rc_lambda, 0.5) for Rc_lambda in Rc_lambdas]) / len(Rc_lambdas)
         return f
    
    Not tested...
  2. We used Ac in Fig 2, and Tables 2-5.

orbeckst avatar Aug 17 '22 16:08 orbeckst

P.S.: Fig 4 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7667952/ should make clearer, what 𝒞(𝑅𝐶) looks like :

Cumulative distribution functions 𝒞(𝑅𝐶) of the convergence time fraction Rc for Coulomb and VDW λ windows

orbeckst avatar Aug 17 '22 17:08 orbeckst

@orbeckst Thanks. I know this might sound like a reviewer question. Why do you "Divide all the counts by the total number of lambda windows." Instead of just taking the average of the 𝑅𝐶 or (1-𝑅𝐶) across all lambda windows?

xiki-tempula avatar Aug 17 '22 18:08 xiki-tempula

I wrote down one way in which one can calculate the cumulative probability function that you see in the figure. I am not sure that you can do this with a simple average.

Why do you "Divide all the counts by the total number of lambda windows."

So that my $\mathcal{C}(R_c)$ is a number between 0 and 1.

(If I am making a mistake somewhere here I hope @VOD555 will correct me.)

We initially wanted a quantity that told us something about the distribution of Rc values. The distribution itself says something about individual values (but we didn't really use it). The cumulative distribution seemed important to us as a measure that assesses all windows together because all lambda values are used for a single free energy estimate. The Ac was then just a way to condense the C(Rc) into a single number.

I don't think that the $A_c$ is equal to the average over all the $R_c$ (or over $1 - R_c$): The average is

$$ \langle R_c \rangle = \int_0^1 dR_c p(r_c) R_c $$

where $p(R_c)$ is probability to observe $R_c$ over all $\lambda$.

On the other hand,

$$ \begin{align} \mathcal{C}(R_c) &= \int_0^{R_c} dR_c' p(R_c')\ A_c &= \int_0^1 dR_c \mathcal{C}(R_c) \end{align} $$

so it's clear that $A_c \neq \langle R_c \rangle$. They measure different things.

Sorry if I misunderstood your question .... please feel free to ask again.

orbeckst avatar Aug 17 '22 20:08 orbeckst

Thanks. Some small questions. Did you run it on only dHdl or the relevant column on the u_nk as well? Did you run the R_c on the original dataset or the dataset after equilibrium detection?

xiki-tempula avatar Sep 22 '22 08:09 xiki-tempula

In our SAMPL7 work we only run it on dHdl, but in principle, one can run it on other column on the u_nk. We run the R_c after equilibrium detection.

VOD555 avatar Sep 22 '22 16:09 VOD555