ProLIF
ProLIF copied to clipboard
Update SMARTS for HBAcceptor and Ligand, fix issue #68
I changed patterns for H-acceptor and Ligand as I suggested in https://github.com/chemosim-lab/ProLIF/issues/68 We may discuss them to make the final version
Hello @DrrDom! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:
- In the file
prolif/interactions.py
:
Line 153:80: E501 line too long (133 > 79 characters) Line 441:80: E501 line too long (98 > 79 characters)
Codecov Report
Merging #73 (3cd68c7) into master (e5a140a) will not change coverage. The diff coverage is
n/a
.
:exclamation: Current head 3cd68c7 differs from pull request most recent head 22777c2. Consider uploading reports for the commit 22777c2 to get more accurate results
@@ Coverage Diff @@
## master #73 +/- ##
=======================================
Coverage 95.57% 95.57%
=======================================
Files 9 9
Lines 995 995
=======================================
Hits 951 951
Misses 44 44
Impacted Files | Coverage Δ | |
---|---|---|
prolif/interactions.py | 99.52% <ø> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@DrrDom I've made some very small changes to the HBond acceptor SMARTS, since from my understanding !$(*(~a)~a)
is another non-standard way to specify non-aromatic atoms, which O
already specifies (aliphatic oxygen), unless I'm missing something.
I've also added tests on SMARTS pattern matching so that what's supposed to pass and what's supposed to fail is a bit more clear, the relevant ones for HBonds are here.
I'll try to merge this week as it's a great improvement over the current behavior, and we can further discuss other interaction SMARTS in a new PR if needed.
I suggest to make other changes related to ionic features, both positive and negative, and include them in this PR. The issue - not recognized delocalization of charges that miss some ionic interactions.
For negative features these are carboxylic, sulfonic and phosphate groups. Oxygen without an explicit charge is not recognized as a center of a negative charge. This can be encoded in a generic way, I hope. Here is an extended pattern - "[-{1-},$(O=[*]-[O-])]"
For positive features these are amidine and guanidine groups. It may be a little bit tricky. The basic suggestion is "[+{1-},$([NH2]-[*]=[NH2+])]"
. It does not take into account substitution of nitrogens. So a more general pattern may be "[+{1-},$([NX3]-[*]=[NX3+])]"
. I'm not sure that the second pattern is good, because it should match groups like [CH3][CH](=[NH2+])[NH][OH]
where NH near oxygen would not delocalize the charge efficiently. To improve it we can construct this pattern [+{1-},$([NX3]-[*]=[NX3+])&!$([NX3](-O)-[*]=[NX3+])]
, which excludes this counter example. I think this would be reasonable. In future other counterexamples can be excluded if needed.
I did not test all these patterns, but they should be valid.
This pattern !$(*(~a)~a)
designates an atom not connected to two aromatic atoms, the atom itself may be not aromatic, e.g. Ph-O-Ph. Diphenylethers are very weak acceptors, Therefore, we may exclude them.
My bad, I mistook ~
with :
, thanks for the clarification.
Yes, agreed. Until I find the time to implement RDKit's resonance structure enumeration for pattern matching this seems like a good workaround.
For the amidine and guanidine, I would have said [+{1-},$([NX3&!$([NX3]-O)]-[*]=[NX3+])]
. Do we need the [*]
in there or can we just specify [C]
?
I agree with current patterns. Yes, we can use [C] in guanidine pattern. I have no further suggestions.
Thanks a lot @DrrDom !
Thank you as well. Great work! We use this tool regularly.