Automatic Analog Generation With Common R-group Replacements | Practical Cheminformatics

Open utterances-bot opened this issue 3 years ago • 4 comments

Automatic Analog Generation With Common R-group Replacements | Practical Cheminformatics

Use data from a recently published database to generate close analogs

https://patwalters.github.io/practicalcheminformatics/jupyter/chembl/2021/07/05/replace-rgroups.html

Jul 13 '21 09:07 utterances-bot

test comment

Jul 13 '21 19:07 PatWalters

Hi, this is a great way of doing ideas generation from a single structure at speed, and I've found it useful with our own endeavours into virtual library creation.

I just wanted to point out an edge case that was significant the way I used the code, it might not necessarily be valid in all cases but it's one where I think there's some value in checking it.

I've found that in situations where not all of the sidechains had matches in the dictionary, I didn't get any results back at all, which seemed strange - not even for the sidechains that did match. A quick search through my logs showed

tmp_smiles = [a + b for a, b in replacement_smiles]
TypeError: cannot unpack non-iterable NoneType object

It looks like in the get_replacements function, there's no default failover value when there's no match. For my case I decided to take the route of returning a dictionary just consisting of the original sidechain in each 'layer', so that it would be re-added at the substitution point. This led to a few duplicates, but since I was taking the unique set as my output, these got filtered out in the process. My revised get_replacements is below:

def get_replacements(smi, replacement_dict):
    cansmi = smi2cansmi(smi)
    return replacement_dict.get(cansmi, [[cansmi], [cansmi]])

I'm not saying this is the most efficient way to deal with the problem by any means, but for the use cases I had this did the trick and ensured that none-matching sidechains were left unchanged during the process.

As another point, given the XML->JSON conversion within this, it does offer a lot of scope for specifying additional transformations at a later date (I've actually taken the JSON version for my own use, as I have a lot of tools to hand for modifying that as opposed to modifying XML). I wonder if any other readers have considered reaching out to their research teams etc for their own common substitutions to bring into the lists?

Jul 14 '21 09:07 JWallaceEvotec

Thank you James, I updated the post to include your suggested change. I agree that JSON is easier to deal with than XML. I'd be happy to include other suggested functional group replacements.

Jul 19 '21 00:07 PatWalters

This site https://www.freeformatter.com/xml-to-json-converter.html did a pretty credible job of converting the XML to JSON. Attributes were given an @ prefix. Python's json module read the result without complaining, though I have yet to use it.

Aug 29 '22 14:08 DavidACosgrove

practicalcheminformatics practicalcheminformatics copied to clipboard

Automatic Analog Generation With Common R-group Replacements | Practical Cheminformatics

Automatic Analog Generation With Common R-group Replacements | Practical Cheminformatics

practicalcheminformatics
practicalcheminformatics copied to clipboard