PyDHAMed
PyDHAMed copied to clipboard
Bias potential matrix
Hi, I wanted to make sure that I understand the definition of the bias potential matrix correctly. Assuming an umbrella sampling experiment with 4 windows A, B, C, D and 3 states 0, 1, 2. The restraining potential defined as k*(x-x0)^2. Should the cell A0 contain the following? x=median value of bin 0, x0=target value of window A
Hi @andt88, thanks for opening the issue. I'm sorry for the slow response, I have been travelling :).
Yes, you're right about the bias matrix. The cell A0 should contain:
k*(x-x0)^2. Should the cell A0 contain the following? x=median value of bin 0, x0=target value of window A
For umbrella sampling, the easiest thing is to define the bias energy of the bin/micro state via the center/median of the bin/micro state, as you have done. See also the following paragraph from the paper on DHAM (which preceded DHAMed) https://pubs.acs.org/doi/full/10.1021/ct500719p
In US, the unbiased potential surface is modified by applying a (typically harmonic) bias potential ui(k) in each window k. This bias is evaluated at the center of the ith bin in the kth simulation, which for a harmonic potential centered at x(k) would result ..
Does you bias matrix have 3 rows and 4 columns, for your 3 states and 4 windows? Do you get a sensible free energy profile?
Hi Lukas, thanks for your response. My matrix is actually larger. I am trying to get this to run and provide feedback.
I am now running into a zero division error: print(len(c_l)) gives me 46, which is the number of simulation windows print(c_l[0].shape) gives me (10, 10). 10 is the number of bins I defined.
dhamed.py:20: RuntimeWarning: divide by zero encountered in log
og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
58
loglike-start nan
nan
Traceback (most recent call last):
File "dhamed.py", line 20, in <module>
og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 182, in run_dhamed
numerical_gradients=numerical_gradients, **kwargs)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 220, in min_dhamed_bfgs
fprime=fprime, **kwargs)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 889, in fmin_bfgs
res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 943, in _minimize_bfgs
gfk = myfprime(x0)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
return function(*(wrapper_args + args))
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 95, in grad_dhamed_likelihood_ref_0
grad = _loop_grad_dhamed_likelihood_0_jit(grad,g, ip, jp, ti, tj, vi, vj, nijp)
ZeroDivisionError: division by zero
Hi @andt88, What are the units of your bias? You might not have to take the negative log of the bias array. Let me know whether that makes a difference :).
I am using Amber, so the unit of the force constant should be kcal/(mol*A²) and thus that of the potential kcal/mol.
Could you provide a minimum example which shows the division by zero error? I'd love to get to the bottom of this and a minimum example for debugging would help a lot!
Hi Lukas, the error appears when I try to calculate the uncertainty. Here's a test case:
import numpy as np
import matplotlib.pyplot as plt
from pydhamed import count_matrix
from pydhamed import run_dhamed
import count_transitions as count_transitions
num_umbrella_windows=4
bin_list=[[25.79675, 26.041050000000002], [26.04105, 26.28535], [26.285349999999998, 26.52965], [26.52965, 26.773950000000003], [26.77395, 27.018250000000002], [27.01825, 27.26255], [27.262549999999997, 27.50685], [27.506849999999996, 27.75115], [27.75115, 27.99545], [27.995449999999998, 28.23975]]
n_states=10
bias_mat=[[2.00000000e+00 3.91720050e+00 6.47352162e+00 9.66856338e+00]
[1.14216498e+00 2.66897408e+00 4.83480608e+00 7.63935872e+00]
[5.23059920e-01 1.65947762e+00 3.43482050e+00 5.84888402e+00]
[1.42684820e-01 8.88711120e-01 2.27356488e+00 4.29713928e+00]
[1.03968000e-03 3.56674580e-01 1.35103922e+00 2.98412450e+00]
[9.81245000e-02 6.33680000e-02 6.67243520e-01 1.90983968e+00]
[4.33939280e-01 8.79138000e-03 2.22177780e-01 1.07428482e+00]
[1.00848402e+00 1.92944720e-01 1.58420000e-02 4.77459920e-01]
[1.82175872e+00 6.15828020e-01 4.82361800e-02 1.19364980e-01]
[2.87376338e+00 1.27744128e+00 3.19360320e-01 0.00000000e+00]]
states_list=[[2, 3, 2, 3, 3], [6, 6, 5, 8, 7], [8, 9, 9, 9], [9, 9, 8]]
c_l=[]
print(bin_list)
print(n_states)
print(bias_mat)
print(states_list)
for i in states_list:
traj=np.array(i)
c_ = count_matrix(traj,n_states=n_states)
c_l.append(c_)
v_ar=bias_mat.reshape(n_states,num_umbrella_windows)
og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
n_blocks = 3
bl_l = []
for bl in np.split(traj[1:],n_blocks):
c_l = [count_matrix(bl,n_states=9)]
bl_g = run_dhamed(c_l, -np.log(v_ar))
bl_l.append(np.exp(-bl_g) / np.sum(np.exp(-bl_g)))
bl_ar = np.column_stack((bl_l))
bl_ln_ar = np.log(bl_ar)
# minmum and maximum values of the PMF (from the five reconstructions) to indicate the uncertainty.
min_pmf = np.min( bl_ln_ar - bl_ln_ar[-1], axis=1)
max_pmf = np.max( bl_ln_ar - bl_ln_ar[-1], axis=1)
bl_ar.shape
dh_log_err = np.std(-np.log(bl_ar), axis=1) / np.sqrt(n_blocks-1)
dh_log_err.shape
fig, ax = plt.subplots(figsize=(5,3))
# error bar is barely visible
#plt.errorbar(rc, rna_pmf, yerr=dh_log_err, fmt=".", capthick=1,capsize=4
rc = np.arange(0,9)
rna_pmf = og*-1
rna_pmf -= rna_pmf[-1]
plt.plot(rc, rna_pmf, ".", label="US DHAMed")
plt.fill_between(rc, min_pmf, max_pmf, alpha=0.4, label=r"min-max PMF $\Delta t=10^4 \, \mathrm{MC}$ ")
ax.set_xlabel("Distance")
ax.set_ylabel("PMF ($\mathrm{k_BT}$)")
ax.set_ylim(-1,13)
ax.set_xlim(-0.4,8.4)
ax.set_yticks([0,5,10])
ax.set_xticks([0,2,4,6,8])
ax.tick_params(direction='out')
plt.legend(borderaxespad=0.1)
plt.tight_layout()
plt.savefig("block_pmf.pdf")```
Fantastic, I'm looking into it right now :-)
Hey, I think the error comes from some frames lying outside of the most left and most right bin. I fixed this by changing the bin distribution. Now I get another errror:
bin_list=[[25.741194444444446, 26.096605555555556], [26.096605555555556, 26.452016666666665], [26.45201666666667, 26.80742777777778], [26.80742777777778, 27.16283888888889], [27.16283888888889, 27.51825], [27.518250000000002, 27.87366111111111], [27.87366111111111, 28.22907222222222], [28.22907222222222, 28.58448333333333], [28.584483333333335, 28.939894444444445], [28.939894444444445, 29.295305555555554]]
n_states=10
bias_mat=[[2.00000000e+00 3.91720050e+00 6.47352162e+00 9.66856338e+00]
[8.30989671e-01 2.18024322e+00 4.16847522e+00 6.79542786e+00]
[1.67247574e-01 9.48554163e-01 2.36869704e+00 4.42756056e+00]
[8.77370889e-03 2.22133342e-01 1.07418710e+00 2.56496150e+00]
[3.55568075e-01 9.80752840e-04 2.84945393e-01 1.20763067e+00]
[1.20763067e+00 2.85096395e-01 9.71915062e-04 3.55568075e-01]
[2.56496150e+00 1.07448027e+00 2.22266669e-01 8.77370889e-03]
[4.42756056e+00 2.36913237e+00 9.48829654e-01 1.67247574e-01]
[6.79542786e+00 4.16905271e+00 2.18066087e+00 8.30989671e-01]
[9.66856338e+00 6.47424128e+00 3.91776032e+00 2.00000000e+00]]
states_list[[2, 2, 2, 2, 2], [4, 4, 4, 5, 5], [6, 6, 7, 7, 6], [6, 7, 7, 7, 6]]
bin 0 excluded
bin 1 excluded
bin 2 excluded
bin 3 excluded
bin 4 excluded
bin 5 excluded
bin 6 excluded
bin 7 excluded
bin 8 excluded
bin 9 excluded
0
loglike-start 0.0
0.0
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 0
Function evaluations: 1
Gradient evaluations: 1
time elapsed 1.19412994385 s
bin 0 excluded
bin 1 excluded
bin 2 excluded
bin 3 excluded
bin 4 excluded
bin 5 excluded
bin 6 excluded
bin 7 excluded
bin 8 excluded
0
loglike-start 0.0
0.0
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 0
Function evaluations: 1
Gradient evaluations: 1
time elapsed 0.00039005279541 s
bin 0 excluded
bin 1 excluded
bin 2 excluded
bin 3 excluded
bin 4 excluded
bin 5 excluded
bin 6 excluded
bin 7 excluded
bin 8 excluded
0
loglike-start 0.0
0.0
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 0
Function evaluations: 1
Gradient evaluations: 1
time elapsed 0.000334978103638 s
Traceback (most recent call last):
File "dhamed_block.py", line 25, in <module>
og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 162, in run_dhamed
n_states, n_windows)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/prepare_dhamed.py", line 211, in generate_dhamed_input
t= state_lifetimes_counts(c_l, n_states, n_win)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/prepare_dhamed.py", line 29, in state_lifetimes_counts
t_ar[:, iwin] = np.sum(win, axis=0)
ValueError: could not broadcast input array from shape (10) into shape (9)
Thanks for the update! Two questions
1 You are loop over your umbrella windows, right?
for i in states_list:
traj=np.array(i)
c_ = count_matrix(traj,n_states=n_states)
c_l.append(c_)
The list of the count matrices grows, while you're iterating. For umbrella sampling, I'd expect one list of count matrices, with one entry for each umbrella window, for instance, for the analysis of umbrella sampling of ion permeation through the GLIC channel, I had:
len(transition_count_l) # one transition count matrix for each of the 153 umbrella windows
153
See https://github.com/bio-phys/PyDHAMed/blob/master/pydhamed/glic-ion-channel/glic_ion_channel_permeation.ipynb for reference.
I'd thus run the optimization once for all four umbrella windows, i.e., I'm not sure whether run_dhamed should be executed within the loop over states. Do let me know if I'm missing something here.
2 Are you sure you have to the take the log of your bias array?
The RNA duplex formation example is a bit misleading in that respect. There the bias returned by the simulations was w_i^a = e^(-beta u_i^a). Thus I have to take the log to get u_i^a= - ln(w_i^\a) k_BT$. Note to myself, I need to highlight this in the RNA example. In the GLIC ion channel example, I can directly input the bias u_i^a=k^a*(x_i-x_0^a)^2.
Thanks for your feedback! I ran an umbrella sampling to calculate the free energy of binding for a ligand to a protein. You're right, executing run_dhamed within the loop is wrong. Regarding the unit of v_ar, I wasn't sure which units run_dhamed actually requires. As I mentioned earlier, my v_ar is in kcal/mol.
for i in states_list:
traj=np.array(i)
c_ = count_matrix(traj,n_states=n_states)
c_l.append(c_)
v_ar=bias_mat.reshape(n_states,num_umbrella_windows)
og = run_dhamed(c_l, v_ar, numerical_gradients=False, jit_gradient=True)
n_blocks = 2
bl_l = []
for bl in np.split(traj[1:],n_blocks):
c_l = [count_matrix(bl,n_states=n_states)]
bl_g = run_dhamed(c_l, v_ar)
bl_l.append(np.exp(-bl_g) / np.sum(np.exp(-bl_g)))
bl_ar = np.column_stack((bl_l))
bl_ln_ar = np.log(bl_ar)
Traceback (most recent call last):
File "dhamed_block.py", line 25, in <module>
og = run_dhamed(c_l, v_ar, numerical_gradients=False, jit_gradient=True)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 182, in run_dhamed
numerical_gradients=numerical_gradients, **kwargs)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 220, in min_dhamed_bfgs
fprime=fprime, **kwargs)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 889, in fmin_bfgs
res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 964, in _minimize_bfgs
old_fval, old_old_fval, amin=1e-100, amax=1e100)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 783, in _line_search_wolfe12
**kwargs)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/linesearch.py", line 101, in line_search_wolfe1
c1=c1, c2=c2, amax=amax, amin=amin, xtol=xtol)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/linesearch.py", line 175, in scalar_search_wolfe1
derphi1 = derphi(stp)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/linesearch.py", line 90, in derphi
gval[0] = fprime(xk + s*pk, *newargs)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
return function(*(wrapper_args + args))
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 95, in grad_dhamed_likelihood_ref_0
grad = _loop_grad_dhamed_likelihood_0_jit(grad,g, ip, jp, ti, tj, vi, vj, nijp)
ZeroDivisionError: division by zero
I think you have to divide your bias array by RT to convert from kcal/mol to kBT. N.B., assuming that your temperature was 300 K and dividing by RT, lets run_dhamed finish, but I get non-sensical values.
Do you still get the division by zero error on your full data set?
Comparison to WHAM provides a quick test, whether your DHAMed results are sensible. You'd expect that many/some features of the WHAM PMF are reproduced by the DHAMed PMF.
I am currently running it on the full data set. I'll let you know once it's finished.
Hi Lukas I am still getting the division by zero error on the full data set. Should I send you the complete input?
Yes, this would be great! I'm very keen to get to the bottom of this problem :)
I attached a file that contains: bin_list n_states bias_mat states_list dhamed.txt
Hi Lukas, did the file work for you? Do you prefer another format?
The file is hard to parse for us. Could you please store the arrays using a native numpy function.
np.savez('dhamed.npz', bias_matrix=bias_matrix, bin_list=bin_list)
This is much easier for us to read again into a python process. The number of states is 1000, right?
Here it is. I had to label it txt to upload it but the format is npz. Thanks a lot!! dhamed.npz.txt
Thanks a lot! I can open your file and I'm trying to figure out what's going on right now.
I managed to get DHAMed to work for your data. I made a trivial mistake in my code that affects the setup of calculations when there are empty bins/states. I'm fixing this right now and I will try to update the repository later today.
https://github.com/bio-phys/PyDHAMed/commit/d42f2a67a2650a0f8b09a798c2afe3b6d85aab08 should fix the problem- the problem was a trivial mistake in the handling of empty bins.
Have a look at the Jupyter notebook (had to rename it to .txt), whether the result makes sense to you.
Thank you so much for fixing this! I am running it right now and will let you know then.
I'm still getting the same error. I downloaded the Master Branch yesterday. Did I get the wrong version?
Did you update your installation? Just in case you can run pip install . --user in the repository to force an update of the installation. Alternatively, a python interpreter started from the repository directly should also pick up the current checkout.
I'm not sure what it is I am doing wrong but I still get the error. I cloned the master branch and ran pip install . --user as you suggested, but I am still getting the zero division error. I am running this on the same npz file I sent to you.
I'll look into that :) Thanks a lot for letting me know!
Is there a possibility to check whether the correct version is installed and called?
You could use ?? to for instance on generate_dhamed_input() to see whether you are calling the latest version.
For generate_dhamed_input() you'd expect to see (towards the end):
` # Remove excluded counts from the total number of transitions out of state.
nk = check_total_transition_counts(n_out, n_in, paired_ar, n_actual) ` which is new.
I checked thoroughly that I have the correct version. Here's the error I get (the "bin exclusion" seems new to me, so I assume this is from the changes in the code):
bin 0 excluded
bin 1 excluded
bin 2 excluded
bin 3 excluded
bin 4 excluded
bin 5 excluded
bin 6 excluded
bin 7 excluded
bin 8 excluded
bin 9 excluded
bin 10 excluded
bin 11 excluded
bin 12 excluded
bin 13 excluded
bin 14 excluded
bin 15 excluded
bin 16 excluded
bin 17 excluded
bin 18 excluded
bin 19 excluded
bin 20 excluded
bin 21 excluded
bin 22 excluded
bin 23 excluded
bin 24 excluded
bin 25 excluded
bin 26 excluded
bin 27 excluded
bin 28 excluded
bin 29 excluded
bin 31 excluded
bin 34 excluded
bin 975 excluded
bin 976 excluded
bin 977 excluded
bin 979 excluded
bin 980 excluded
bin 981 excluded
bin 982 excluded
bin 983 excluded
bin 984 excluded
bin 985 excluded
bin 986 excluded
bin 987 excluded
bin 988 excluded
bin 989 excluded
bin 990 excluded
bin 991 excluded
bin 992 excluded
bin 993 excluded
bin 994 excluded
bin 995 excluded
bin 996 excluded
bin 997 excluded
bin 998 excluded
bin 999 excluded
Number of transition pairs 169579
inf
Traceback (most recent call last):
File "dhamed.py", line 35, in <module>
og = run_dhamed(c_l, np.divide(v_ar, 0.00198720358509), numerical_gradients=False, jit_gradient=True)
File "/home/local/andtos-loc/.local/lib/python2.7/site-packages/pydhamed/optimize_dhamed.py", line 219, in run_dhamed
numerical_gradients=numerical_gradients, **kwargs)
File "/home/local/andtos-loc/.local/lib/python2.7/site-packages/pydhamed/optimize_dhamed.py", line 257, in min_dhamed_bfgs
fprime=fprime, **kwargs)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 889, in fmin_bfgs
res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 943, in _minimize_bfgs
gfk = myfprime(x0)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
return function(*(wrapper_args + args))
File "/home/local/andtos-loc/.local/lib/python2.7/site-packages/pydhamed/optimize_dhamed.py", line 128, in grad_dhamed_likelihood_ref_0
grad = _loop_grad_dhamed_likelihood_0_jit(grad,g, ip, jp, ti, tj, vi, vj, nijp)
ZeroDivisionError: division by zero