espsim
espsim copied to clipboard
ESP similarity formala intuition
Hi @hesther ,
Appeciate your work. I am having difficulty understaning the esp similarity formula in the end of short_demonstration.ipynb.
It seems to focus on the sign alignment situation among all space, then why not integral of phi_a * phi_b / (|phi_a|*|phi_b|), maybe I get it wrong. What's the intuition behind the formula.
Hi @PigUnderRoof, there is some research behind how to best calculate overlap integrals for electrostatic potential similarities. Different metrics (e.g. Carbo - the formula above, Tanimoto or Hopkins, etc) differently weigh contributions of the overlap. For example, the articles here and here give some good explanations. Carbo similarities focus more on the sign alignment, and less on the magnitude of the potential, which I have found beneficial in some applications. You can also use Tanimoto similarities (Overlap(A,B)/(Overlap(A,A)+Overlap(A,B)-Overlap(B,B)) within ESP-Sim, if you prefer. Note that the range of each similarity metric is different (Carbo: -1 to 1; Tanimoto: -1/3 to 1), so that you cannot directly compare the numbers.
Similarity metrics are very abundant in Cheminformatics and also used for e.g. fingerprint comparisons (this website has a nice overview of some of the metrics).
The formula you propose is really the same as Carbo (provided I understand correctly how you defined |phi_a|): You can put the square root (exponent of 1/2) to each integral: S=int phi_a*phi_b/(sqrt(int phi_a,phi_a)*sqrt(int phi_b,phi_b)) = phi_a * phi_b / (area under phi_a * area under phi_b). (sqrt(int phi_a,phi_a) is just a way to access the magnitude of the self-overlap (very similar to how you would calculate probabilities from the integral of the squared wave function in quantum mechanics).
Thank you for all the information.
Overlap in Carbo's paper makes sense to me, because he was dealing with density function. After replacing density function with potential function, I feel it's not about overlaping anymore.
I tried MC methods with different intergral methods, S1 = int p_ap_b / sqrt( (int p_ap_a) * (int p_b * p_b) ), S2 = int p_a * p_b / (|p_a| * |p_b|) , |x| means abs(x),S1 and S2 doesn't agree.
Hi @hesther , bringing this up again as I had a related question.
I see in the codebase that the default argument for metric
in EmbedAlignScore()
is set to be "carbo"
.
However, in your benchmarks, e.g. https://github.com/hesther/espsim/blob/master/benchmarks/benchmark_3_d4-rescore.ipynb , I see you have used metric="tanimoto"
.
Is there any significant reason for doing so? I also want to re-score docked poses just like in your benchmark notebook (from AutoDock Vina) and am wondering whether I should use "carbo"
or "tanimoto"
.
Hi @linminhtoo , It really depends on what kind of comparison you would like to make. Carbo is largely insensitive toward the magnitude of the function, so as long as repulsive and attractive parts of the ESP overlap in two molecules it will yield a good score. Tanimoto is also sensitive toward the magnitude of the function, so will give a lower score if you have the same ESP form, but for example less partial charges. I.e. two dipoles with delta qs of +-0.2 and +-0.4 with overlapping atoms would have a perfect score with Carbo, but a much lower score with Tanimoto. So, if magnitude matters, use Tanimoto, else use Carbo. In our rescoring studies, we mainly found that both metrics yield good result, as long as they are used consistently, because many molecules differ in the ESP form anyways (not just the magnitude).
In the paper, we ended up using Tanimoto because that is more commonly used for comparing 2D similarities, and thus my collaborators had a preference for that. But as I said, in the rescoring benchmarks there was hardly any difference between the two.