ProLIF icon indicating copy to clipboard operation
ProLIF copied to clipboard

Update SMARTS for HBAcceptor and Ligand, fix issue #68

Open DrrDom opened this issue 1 year ago • 2 comments

I changed patterns for H-acceptor and Ligand as I suggested in https://github.com/chemosim-lab/ProLIF/issues/68 We may discuss them to make the final version

DrrDom avatar Jul 28 '22 11:07 DrrDom

Hello @DrrDom! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 153:80: E501 line too long (133 > 79 characters) Line 441:80: E501 line too long (98 > 79 characters)

pep8speaks avatar Jul 28 '22 11:07 pep8speaks

Codecov Report

Merging #73 (3cd68c7) into master (e5a140a) will not change coverage. The diff coverage is n/a.

:exclamation: Current head 3cd68c7 differs from pull request most recent head 22777c2. Consider uploading reports for the commit 22777c2 to get more accurate results

@@           Coverage Diff           @@
##           master      #73   +/-   ##
=======================================
  Coverage   95.57%   95.57%           
=======================================
  Files           9        9           
  Lines         995      995           
=======================================
  Hits          951      951           
  Misses         44       44           
Impacted Files Coverage Δ
prolif/interactions.py 99.52% <ø> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Jul 29 '22 01:07 codecov[bot]

@DrrDom I've made some very small changes to the HBond acceptor SMARTS, since from my understanding !$(*(~a)~a) is another non-standard way to specify non-aromatic atoms, which O already specifies (aliphatic oxygen), unless I'm missing something.

I've also added tests on SMARTS pattern matching so that what's supposed to pass and what's supposed to fail is a bit more clear, the relevant ones for HBonds are here.

I'll try to merge this week as it's a great improvement over the current behavior, and we can further discuss other interaction SMARTS in a new PR if needed.

cbouy avatar Oct 02 '22 20:10 cbouy

I suggest to make other changes related to ionic features, both positive and negative, and include them in this PR. The issue - not recognized delocalization of charges that miss some ionic interactions.

For negative features these are carboxylic, sulfonic and phosphate groups. Oxygen without an explicit charge is not recognized as a center of a negative charge. This can be encoded in a generic way, I hope. Here is an extended pattern - "[-{1-},$(O=[*]-[O-])]"

For positive features these are amidine and guanidine groups. It may be a little bit tricky. The basic suggestion is "[+{1-},$([NH2]-[*]=[NH2+])]". It does not take into account substitution of nitrogens. So a more general pattern may be "[+{1-},$([NX3]-[*]=[NX3+])]". I'm not sure that the second pattern is good, because it should match groups like [CH3][CH](=[NH2+])[NH][OH] where NH near oxygen would not delocalize the charge efficiently. To improve it we can construct this pattern [+{1-},$([NX3]-[*]=[NX3+])&!$([NX3](-O)-[*]=[NX3+])], which excludes this counter example. I think this would be reasonable. In future other counterexamples can be excluded if needed.

I did not test all these patterns, but they should be valid.

DrrDom avatar Oct 03 '22 12:10 DrrDom

This pattern !$(*(~a)~a) designates an atom not connected to two aromatic atoms, the atom itself may be not aromatic, e.g. Ph-O-Ph. Diphenylethers are very weak acceptors, Therefore, we may exclude them.

DrrDom avatar Oct 03 '22 12:10 DrrDom

My bad, I mistook ~ with :, thanks for the clarification.

Yes, agreed. Until I find the time to implement RDKit's resonance structure enumeration for pattern matching this seems like a good workaround.

For the amidine and guanidine, I would have said [+{1-},$([NX3&!$([NX3]-O)]-[*]=[NX3+])]. Do we need the [*] in there or can we just specify [C] ?

cbouy avatar Oct 03 '22 23:10 cbouy

I agree with current patterns. Yes, we can use [C] in guanidine pattern. I have no further suggestions.

DrrDom avatar Oct 04 '22 04:10 DrrDom

Thanks a lot @DrrDom !

cbouy avatar Oct 04 '22 21:10 cbouy

Thank you as well. Great work! We use this tool regularly.

DrrDom avatar Oct 04 '22 21:10 DrrDom