PyDHAMed icon indicating copy to clipboard operation
PyDHAMed copied to clipboard

Bias potential matrix

Open and-tos opened this issue 7 years ago • 50 comments
trafficstars

Hi, I wanted to make sure that I understand the definition of the bias potential matrix correctly. Assuming an umbrella sampling experiment with 4 windows A, B, C, D and 3 states 0, 1, 2. The restraining potential defined as k*(x-x0)^2. Should the cell A0 contain the following? x=median value of bin 0, x0=target value of window A

and-tos avatar Feb 21 '18 17:02 and-tos

Hi @andt88, thanks for opening the issue. I'm sorry for the slow response, I have been travelling :).

Yes, you're right about the bias matrix. The cell A0 should contain:

k*(x-x0)^2. Should the cell A0 contain the following? x=median value of bin 0, x0=target value of window A

For umbrella sampling, the easiest thing is to define the bias energy of the bin/micro state via the center/median of the bin/micro state, as you have done. See also the following paragraph from the paper on DHAM (which preceded DHAMed) https://pubs.acs.org/doi/full/10.1021/ct500719p

In US, the unbiased potential surface is modified by applying a (typically harmonic) bias potential ui(k) in each window k. This bias is evaluated at the center of the ith bin in the kth simulation, which for a harmonic potential centered at x(k) would result ..

Does you bias matrix have 3 rows and 4 columns, for your 3 states and 4 windows? Do you get a sensible free energy profile?

lukas-stelzl avatar Feb 23 '18 01:02 lukas-stelzl

Hi Lukas, thanks for your response. My matrix is actually larger. I am trying to get this to run and provide feedback.

and-tos avatar Feb 27 '18 12:02 and-tos

I am now running into a zero division error: print(len(c_l)) gives me 46, which is the number of simulation windows print(c_l[0].shape) gives me (10, 10). 10 is the number of bins I defined.

dhamed.py:20: RuntimeWarning: divide by zero encountered in log
  og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
58
loglike-start nan
nan
Traceback (most recent call last):
  File "dhamed.py", line 20, in <module>
    og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
  File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 182, in run_dhamed
    numerical_gradients=numerical_gradients, **kwargs)
  File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 220, in min_dhamed_bfgs
    fprime=fprime, **kwargs)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 889, in fmin_bfgs
    res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 943, in _minimize_bfgs
    gfk = myfprime(x0)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
    return function(*(wrapper_args + args))
  File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 95, in grad_dhamed_likelihood_ref_0
    grad = _loop_grad_dhamed_likelihood_0_jit(grad,g, ip, jp, ti, tj, vi, vj, nijp)
ZeroDivisionError: division by zero

and-tos avatar Feb 27 '18 15:02 and-tos

Hi @andt88, What are the units of your bias? You might not have to take the negative log of the bias array. Let me know whether that makes a difference :).

lukas-stelzl avatar Feb 27 '18 21:02 lukas-stelzl

I am using Amber, so the unit of the force constant should be kcal/(mol*A²) and thus that of the potential kcal/mol.

and-tos avatar Feb 28 '18 08:02 and-tos

Could you provide a minimum example which shows the division by zero error? I'd love to get to the bottom of this and a minimum example for debugging would help a lot!

lukas-stelzl avatar Feb 28 '18 23:02 lukas-stelzl

Hi Lukas, the error appears when I try to calculate the uncertainty. Here's a test case:

import numpy as np

import matplotlib.pyplot as plt
from pydhamed import count_matrix
from pydhamed import run_dhamed
import count_transitions as count_transitions


num_umbrella_windows=4

bin_list=[[25.79675, 26.041050000000002], [26.04105, 26.28535], [26.285349999999998, 26.52965], [26.52965, 26.773950000000003], [26.77395, 27.018250000000002], [27.01825, 27.26255], [27.262549999999997, 27.50685], [27.506849999999996, 27.75115], [27.75115, 27.99545], [27.995449999999998, 28.23975]]

n_states=10
bias_mat=[[2.00000000e+00 3.91720050e+00 6.47352162e+00 9.66856338e+00]
 [1.14216498e+00 2.66897408e+00 4.83480608e+00 7.63935872e+00]
 [5.23059920e-01 1.65947762e+00 3.43482050e+00 5.84888402e+00]
 [1.42684820e-01 8.88711120e-01 2.27356488e+00 4.29713928e+00]
 [1.03968000e-03 3.56674580e-01 1.35103922e+00 2.98412450e+00]
 [9.81245000e-02 6.33680000e-02 6.67243520e-01 1.90983968e+00]
 [4.33939280e-01 8.79138000e-03 2.22177780e-01 1.07428482e+00]
 [1.00848402e+00 1.92944720e-01 1.58420000e-02 4.77459920e-01]
 [1.82175872e+00 6.15828020e-01 4.82361800e-02 1.19364980e-01]
 [2.87376338e+00 1.27744128e+00 3.19360320e-01 0.00000000e+00]]

states_list=[[2, 3, 2, 3, 3], [6, 6, 5, 8, 7], [8, 9, 9, 9], [9, 9, 8]]

c_l=[]

print(bin_list)
print(n_states)
print(bias_mat)
print(states_list)

for i in states_list:
    traj=np.array(i)
    c_ = count_matrix(traj,n_states=n_states)
    c_l.append(c_)
    v_ar=bias_mat.reshape(n_states,num_umbrella_windows)
    og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
    n_blocks = 3
    bl_l = []
    for bl in np.split(traj[1:],n_blocks):
        c_l = [count_matrix(bl,n_states=9)]
        bl_g = run_dhamed(c_l, -np.log(v_ar))
        bl_l.append(np.exp(-bl_g) / np.sum(np.exp(-bl_g)))
    bl_ar = np.column_stack((bl_l))
    bl_ln_ar = np.log(bl_ar)

# minmum and maximum values of the PMF (from the five reconstructions) to indicate the uncertainty.
min_pmf = np.min( bl_ln_ar - bl_ln_ar[-1], axis=1)
max_pmf = np.max( bl_ln_ar - bl_ln_ar[-1], axis=1)
bl_ar.shape
dh_log_err = np.std(-np.log(bl_ar), axis=1) / np.sqrt(n_blocks-1)
dh_log_err.shape
fig, ax = plt.subplots(figsize=(5,3))

# error bar is barely visible
#plt.errorbar(rc, rna_pmf, yerr=dh_log_err, fmt=".", capthick=1,capsize=4
rc = np.arange(0,9)
rna_pmf = og*-1
rna_pmf -= rna_pmf[-1]
plt.plot(rc, rna_pmf, ".", label="US DHAMed")

plt.fill_between(rc, min_pmf, max_pmf, alpha=0.4, label=r"min-max PMF $\Delta t=10^4 \, \mathrm{MC}$ ")

ax.set_xlabel("Distance")
ax.set_ylabel("PMF ($\mathrm{k_BT}$)")
ax.set_ylim(-1,13)
ax.set_xlim(-0.4,8.4)
ax.set_yticks([0,5,10])
ax.set_xticks([0,2,4,6,8])
ax.tick_params(direction='out')

plt.legend(borderaxespad=0.1)
plt.tight_layout()
plt.savefig("block_pmf.pdf")```

and-tos avatar Mar 02 '18 09:03 and-tos

Fantastic, I'm looking into it right now :-)

lukas-stelzl avatar Mar 02 '18 10:03 lukas-stelzl

Hey, I think the error comes from some frames lying outside of the most left and most right bin. I fixed this by changing the bin distribution. Now I get another errror:

bin_list=[[25.741194444444446, 26.096605555555556], [26.096605555555556, 26.452016666666665], [26.45201666666667, 26.80742777777778], [26.80742777777778, 27.16283888888889], [27.16283888888889, 27.51825], [27.518250000000002, 27.87366111111111], [27.87366111111111, 28.22907222222222], [28.22907222222222, 28.58448333333333], [28.584483333333335, 28.939894444444445], [28.939894444444445, 29.295305555555554]]
n_states=10
bias_mat=[[2.00000000e+00 3.91720050e+00 6.47352162e+00 9.66856338e+00]
 [8.30989671e-01 2.18024322e+00 4.16847522e+00 6.79542786e+00]
 [1.67247574e-01 9.48554163e-01 2.36869704e+00 4.42756056e+00]
 [8.77370889e-03 2.22133342e-01 1.07418710e+00 2.56496150e+00]
 [3.55568075e-01 9.80752840e-04 2.84945393e-01 1.20763067e+00]
 [1.20763067e+00 2.85096395e-01 9.71915062e-04 3.55568075e-01]
 [2.56496150e+00 1.07448027e+00 2.22266669e-01 8.77370889e-03]
 [4.42756056e+00 2.36913237e+00 9.48829654e-01 1.67247574e-01]
 [6.79542786e+00 4.16905271e+00 2.18066087e+00 8.30989671e-01]
 [9.66856338e+00 6.47424128e+00 3.91776032e+00 2.00000000e+00]]
states_list[[2, 2, 2, 2, 2], [4, 4, 4, 5, 5], [6, 6, 7, 7, 6], [6, 7, 7, 7, 6]]
bin 0 excluded
bin 1 excluded
bin 2 excluded
bin 3 excluded
bin 4 excluded
bin 5 excluded
bin 6 excluded
bin 7 excluded
bin 8 excluded
bin 9 excluded
0
loglike-start 0.0
0.0
Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 0
         Function evaluations: 1
         Gradient evaluations: 1
time elapsed 1.19412994385 s
bin 0 excluded
bin 1 excluded
bin 2 excluded
bin 3 excluded
bin 4 excluded
bin 5 excluded
bin 6 excluded
bin 7 excluded
bin 8 excluded
0
loglike-start 0.0
0.0
Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 0
         Function evaluations: 1
         Gradient evaluations: 1
time elapsed 0.00039005279541 s
bin 0 excluded
bin 1 excluded
bin 2 excluded
bin 3 excluded
bin 4 excluded
bin 5 excluded
bin 6 excluded
bin 7 excluded
bin 8 excluded
0
loglike-start 0.0
0.0
Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 0
         Function evaluations: 1
         Gradient evaluations: 1
time elapsed 0.000334978103638 s
Traceback (most recent call last):
  File "dhamed_block.py", line 25, in <module>
    og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
  File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 162, in run_dhamed
    n_states, n_windows)
  File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/prepare_dhamed.py", line 211, in generate_dhamed_input
    t= state_lifetimes_counts(c_l, n_states, n_win)
  File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/prepare_dhamed.py", line 29, in state_lifetimes_counts
    t_ar[:, iwin] = np.sum(win, axis=0)
ValueError: could not broadcast input array from shape (10) into shape (9)

and-tos avatar Mar 02 '18 10:03 and-tos

Thanks for the update! Two questions

1 You are loop over your umbrella windows, right?


for i in states_list:
    traj=np.array(i)
    c_ = count_matrix(traj,n_states=n_states)
    c_l.append(c_) 

The list of the count matrices grows, while you're iterating. For umbrella sampling, I'd expect one list of count matrices, with one entry for each umbrella window, for instance, for the analysis of umbrella sampling of ion permeation through the GLIC channel, I had:

len(transition_count_l) # one transition count matrix for each of the 153 umbrella windows
153

See https://github.com/bio-phys/PyDHAMed/blob/master/pydhamed/glic-ion-channel/glic_ion_channel_permeation.ipynb for reference.

I'd thus run the optimization once for all four umbrella windows, i.e., I'm not sure whether run_dhamed should be executed within the loop over states. Do let me know if I'm missing something here.

2 Are you sure you have to the take the log of your bias array?

The RNA duplex formation example is a bit misleading in that respect. There the bias returned by the simulations was w_i^a = e^(-beta u_i^a). Thus I have to take the log to get u_i^a= - ln(w_i^\a) k_BT$. Note to myself, I need to highlight this in the RNA example. In the GLIC ion channel example, I can directly input the bias u_i^a=k^a*(x_i-x_0^a)^2.

lukas-stelzl avatar Mar 02 '18 11:03 lukas-stelzl

Thanks for your feedback! I ran an umbrella sampling to calculate the free energy of binding for a ligand to a protein. You're right, executing run_dhamed within the loop is wrong. Regarding the unit of v_ar, I wasn't sure which units run_dhamed actually requires. As I mentioned earlier, my v_ar is in kcal/mol.

for i in states_list:
    traj=np.array(i)
    c_ = count_matrix(traj,n_states=n_states)
    c_l.append(c_)
v_ar=bias_mat.reshape(n_states,num_umbrella_windows)
og = run_dhamed(c_l, v_ar, numerical_gradients=False, jit_gradient=True)
n_blocks = 2
bl_l = []
for bl in np.split(traj[1:],n_blocks):
        c_l = [count_matrix(bl,n_states=n_states)]
        bl_g = run_dhamed(c_l, v_ar)
        bl_l.append(np.exp(-bl_g) / np.sum(np.exp(-bl_g)))
bl_ar = np.column_stack((bl_l))
bl_ln_ar = np.log(bl_ar)
Traceback (most recent call last):
  File "dhamed_block.py", line 25, in <module>
    og = run_dhamed(c_l, v_ar, numerical_gradients=False, jit_gradient=True)
  File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 182, in run_dhamed
    numerical_gradients=numerical_gradients, **kwargs)
  File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 220, in min_dhamed_bfgs
    fprime=fprime, **kwargs)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 889, in fmin_bfgs
    res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 964, in _minimize_bfgs
    old_fval, old_old_fval, amin=1e-100, amax=1e100)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 783, in _line_search_wolfe12
    **kwargs)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/linesearch.py", line 101, in line_search_wolfe1
    c1=c1, c2=c2, amax=amax, amin=amin, xtol=xtol)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/linesearch.py", line 175, in scalar_search_wolfe1
    derphi1 = derphi(stp)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/linesearch.py", line 90, in derphi
    gval[0] = fprime(xk + s*pk, *newargs)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
    return function(*(wrapper_args + args))
  File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 95, in grad_dhamed_likelihood_ref_0
    grad = _loop_grad_dhamed_likelihood_0_jit(grad,g, ip, jp, ti, tj, vi, vj, nijp)
ZeroDivisionError: division by zero

and-tos avatar Mar 02 '18 11:03 and-tos

I think you have to divide your bias array by RT to convert from kcal/mol to kBT. N.B., assuming that your temperature was 300 K and dividing by RT, lets run_dhamed finish, but I get non-sensical values.

Do you still get the division by zero error on your full data set?

Comparison to WHAM provides a quick test, whether your DHAMed results are sensible. You'd expect that many/some features of the WHAM PMF are reproduced by the DHAMed PMF.

lukas-stelzl avatar Mar 02 '18 16:03 lukas-stelzl

I am currently running it on the full data set. I'll let you know once it's finished.

and-tos avatar Mar 02 '18 16:03 and-tos

Hi Lukas I am still getting the division by zero error on the full data set. Should I send you the complete input?

and-tos avatar Mar 06 '18 08:03 and-tos

Yes, this would be great! I'm very keen to get to the bottom of this problem :)

lukas-stelzl avatar Mar 06 '18 10:03 lukas-stelzl

I attached a file that contains: bin_list n_states bias_mat states_list dhamed.txt

and-tos avatar Mar 06 '18 16:03 and-tos

Hi Lukas, did the file work for you? Do you prefer another format?

and-tos avatar Mar 08 '18 09:03 and-tos

The file is hard to parse for us. Could you please store the arrays using a native numpy function.

np.savez('dhamed.npz', bias_matrix=bias_matrix, bin_list=bin_list)

This is much easier for us to read again into a python process. The number of states is 1000, right?

kain88-de avatar Mar 08 '18 12:03 kain88-de

Here it is. I had to label it txt to upload it but the format is npz. Thanks a lot!! dhamed.npz.txt

and-tos avatar Mar 08 '18 18:03 and-tos

Thanks a lot! I can open your file and I'm trying to figure out what's going on right now.

lukas-stelzl avatar Mar 12 '18 13:03 lukas-stelzl

I managed to get DHAMed to work for your data. I made a trivial mistake in my code that affects the setup of calculations when there are empty bins/states. I'm fixing this right now and I will try to update the repository later today.

lukas-stelzl avatar Mar 13 '18 10:03 lukas-stelzl

https://github.com/bio-phys/PyDHAMed/commit/d42f2a67a2650a0f8b09a798c2afe3b6d85aab08 should fix the problem- the problem was a trivial mistake in the handling of empty bins.

Have a look at the Jupyter notebook (had to rename it to .txt), whether the result makes sense to you.

handle_excluded_states-3.ipynb.txt

lukas-stelzl avatar Mar 13 '18 18:03 lukas-stelzl

Thank you so much for fixing this! I am running it right now and will let you know then.

and-tos avatar Mar 13 '18 21:03 and-tos

I'm still getting the same error. I downloaded the Master Branch yesterday. Did I get the wrong version?

and-tos avatar Mar 14 '18 09:03 and-tos

Did you update your installation? Just in case you can run pip install . --user in the repository to force an update of the installation. Alternatively, a python interpreter started from the repository directly should also pick up the current checkout.

kain88-de avatar Mar 14 '18 09:03 kain88-de

I'm not sure what it is I am doing wrong but I still get the error. I cloned the master branch and ran pip install . --user as you suggested, but I am still getting the zero division error. I am running this on the same npz file I sent to you.

and-tos avatar Mar 15 '18 14:03 and-tos

I'll look into that :) Thanks a lot for letting me know!

lukas-stelzl avatar Mar 15 '18 14:03 lukas-stelzl

Is there a possibility to check whether the correct version is installed and called?

and-tos avatar Mar 15 '18 15:03 and-tos

You could use ?? to for instance on generate_dhamed_input() to see whether you are calling the latest version.

For generate_dhamed_input() you'd expect to see (towards the end): ` # Remove excluded counts from the total number of transitions out of state.

nk = check_total_transition_counts(n_out, n_in, paired_ar, n_actual) ` which is new.

lukas-stelzl avatar Mar 15 '18 15:03 lukas-stelzl

I checked thoroughly that I have the correct version. Here's the error I get (the "bin exclusion" seems new to me, so I assume this is from the changes in the code):

bin 0 excluded
bin 1 excluded
bin 2 excluded
bin 3 excluded
bin 4 excluded
bin 5 excluded
bin 6 excluded
bin 7 excluded
bin 8 excluded
bin 9 excluded
bin 10 excluded
bin 11 excluded
bin 12 excluded
bin 13 excluded
bin 14 excluded
bin 15 excluded
bin 16 excluded
bin 17 excluded
bin 18 excluded
bin 19 excluded
bin 20 excluded
bin 21 excluded
bin 22 excluded
bin 23 excluded
bin 24 excluded
bin 25 excluded
bin 26 excluded
bin 27 excluded
bin 28 excluded
bin 29 excluded
bin 31 excluded
bin 34 excluded
bin 975 excluded
bin 976 excluded
bin 977 excluded
bin 979 excluded
bin 980 excluded
bin 981 excluded
bin 982 excluded
bin 983 excluded
bin 984 excluded
bin 985 excluded
bin 986 excluded
bin 987 excluded
bin 988 excluded
bin 989 excluded
bin 990 excluded
bin 991 excluded
bin 992 excluded
bin 993 excluded
bin 994 excluded
bin 995 excluded
bin 996 excluded
bin 997 excluded
bin 998 excluded
bin 999 excluded
Number of transition pairs 169579
inf
Traceback (most recent call last):
  File "dhamed.py", line 35, in <module>
    og = run_dhamed(c_l, np.divide(v_ar, 0.00198720358509), numerical_gradients=False, jit_gradient=True)
  File "/home/local/andtos-loc/.local/lib/python2.7/site-packages/pydhamed/optimize_dhamed.py", line 219, in run_dhamed
    numerical_gradients=numerical_gradients, **kwargs)
  File "/home/local/andtos-loc/.local/lib/python2.7/site-packages/pydhamed/optimize_dhamed.py", line 257, in min_dhamed_bfgs
    fprime=fprime, **kwargs)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 889, in fmin_bfgs
    res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 943, in _minimize_bfgs
    gfk = myfprime(x0)
  File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
    return function(*(wrapper_args + args))
  File "/home/local/andtos-loc/.local/lib/python2.7/site-packages/pydhamed/optimize_dhamed.py", line 128, in grad_dhamed_likelihood_ref_0
    grad = _loop_grad_dhamed_likelihood_0_jit(grad,g, ip, jp, ti, tj, vi, vj, nijp)
ZeroDivisionError: division by zero

and-tos avatar Mar 16 '18 14:03 and-tos