practicalcheminformatics
practicalcheminformatics copied to clipboard
Automatic Analog Generation With Common R-group Replacements | Practical Cheminformatics
Automatic Analog Generation With Common R-group Replacements | Practical Cheminformatics
Use data from a recently published database to generate close analogs
https://patwalters.github.io/practicalcheminformatics/jupyter/chembl/2021/07/05/replace-rgroups.html
test comment
Hi, this is a great way of doing ideas generation from a single structure at speed, and I've found it useful with our own endeavours into virtual library creation.
I just wanted to point out an edge case that was significant the way I used the code, it might not necessarily be valid in all cases but it's one where I think there's some value in checking it.
I've found that in situations where not all of the sidechains had matches in the dictionary, I didn't get any results back at all, which seemed strange - not even for the sidechains that did match. A quick search through my logs showed
tmp_smiles = [a + b for a, b in replacement_smiles]
TypeError: cannot unpack non-iterable NoneType object
It looks like in the get_replacements function, there's no default failover value when there's no match. For my case I decided to take the route of returning a dictionary just consisting of the original sidechain in each 'layer', so that it would be re-added at the substitution point. This led to a few duplicates, but since I was taking the unique set as my output, these got filtered out in the process. My revised get_replacements is below:
def get_replacements(smi, replacement_dict):
cansmi = smi2cansmi(smi)
return replacement_dict.get(cansmi, [[cansmi], [cansmi]])
I'm not saying this is the most efficient way to deal with the problem by any means, but for the use cases I had this did the trick and ensured that none-matching sidechains were left unchanged during the process.
As another point, given the XML->JSON conversion within this, it does offer a lot of scope for specifying additional transformations at a later date (I've actually taken the JSON version for my own use, as I have a lot of tools to hand for modifying that as opposed to modifying XML). I wonder if any other readers have considered reaching out to their research teams etc for their own common substitutions to bring into the lists?
Thank you James, I updated the post to include your suggested change. I agree that JSON is easier to deal with than XML. I'd be happy to include other suggested functional group replacements.
This site https://www.freeformatter.com/xml-to-json-converter.html did a pretty credible job of converting the XML to JSON. Attributes were given an @ prefix. Python's json module read the result without complaining, though I have yet to use it.