MDANSE icon indicating copy to clipboard operation
MDANSE copied to clipboard

[ENHANCEMENT] Molecule bonding determination and labelling using rdkit

Open ChiCheng45 opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe. Currently, MDANSE only determines whether something is an molecule or bulk structures based on the input file of the converter e.g. when a PDB file is used. It would be useful to determine whether an atom belongs to a molecule or bulk structure during the conversion process. We could then improve things like the atom selection and the 3D viewer.

Describe the solution you'd like Add some molecule determination to the converter process for input files which do not have bonding information. There is some new rdkit functions that we can use. See https://github.com/rdkit/rdkit/discussions/5759 and https://github.com/rdkit/UGM_2022/blob/main/Notebooks/Landrum_WhatsNew.ipynb. We could label each molecule by its canonical smiles and group them together for atoms selection and other purposes.

Additional context Here is some example code I created from the examples in the links above.

from rdkit import Chem

raw_mol = Chem.MolFromXYZBlock(
"""19

C	 0.0645055554	 1.4843171326	 0.3723315122
C	-0.001915467	 0.0516984051	-0.1729357038
C	-1.4001624807	-0.5304376801	-0.1487075838
C	-1.9909553844	-1.0379429662	 1.1601625215
C	-1.7182271444	-1.9962888369	 0.0152208454
C	-2.905416575	-2.2627721735	-0.8489696604
C	-3.3347497536	-1.1525477782	-1.4661914556
C	-2.4353480328	-0.0114763902	-1.1388331005
O	-2.4853518918	 1.1093228549	-1.595794586
H	 1.0876615149	 1.8699729825	 0.3263064541
H	-0.5836533924	 2.142053447	-0.2101812431
H	-0.2626541721	 1.5210712578	 1.4170099662
H	 0.3710335675	 0.0343653078	-1.2041622283
H	 0.662791197	-0.6008609283	 0.4071642198
H	-1.3142481944	-1.0584396879	 2.0096409673
H	-3.0252533013	-0.8199288984	 1.4038330438
H	-0.9412329872	-2.7436632889	 0.1278385726
H	-3.37329736	-3.2394582332	-0.9072563749
H	-4.2004762777	-1.0463782959	-2.1061677066"""
)

from rdkit.Chem import rdDetermineBonds
bond_moll = Chem.Mol(raw_mol)
rdDetermineBonds.DetermineBonds(bond_moll,charge=0)
print(Chem.CanonSmiles(Chem.MolToSmiles(bond_moll)))

Which gives the output. CC[C@]12C[C@H]1C=CC2=O

ChiCheng45 avatar Mar 13 '24 09:03 ChiCheng45

A solution to #296?

ChiCheng45 avatar Mar 13 '24 10:03 ChiCheng45

This looks better than the solution I tried last time. Should work

MBartkowiakSTFC avatar Mar 14 '24 10:03 MBartkowiakSTFC

This has been implemented in MoleculeFinder in #392

MBartkowiakSTFC avatar Jul 09 '24 12:07 MBartkowiakSTFC