[ENHANCEMENT] Molecule bonding determination and labelling using rdkit
Is your feature request related to a problem? Please describe. Currently, MDANSE only determines whether something is an molecule or bulk structures based on the input file of the converter e.g. when a PDB file is used. It would be useful to determine whether an atom belongs to a molecule or bulk structure during the conversion process. We could then improve things like the atom selection and the 3D viewer.
Describe the solution you'd like Add some molecule determination to the converter process for input files which do not have bonding information. There is some new rdkit functions that we can use. See https://github.com/rdkit/rdkit/discussions/5759 and https://github.com/rdkit/UGM_2022/blob/main/Notebooks/Landrum_WhatsNew.ipynb. We could label each molecule by its canonical smiles and group them together for atoms selection and other purposes.
Additional context Here is some example code I created from the examples in the links above.
from rdkit import Chem
raw_mol = Chem.MolFromXYZBlock(
"""19
C 0.0645055554 1.4843171326 0.3723315122
C -0.001915467 0.0516984051 -0.1729357038
C -1.4001624807 -0.5304376801 -0.1487075838
C -1.9909553844 -1.0379429662 1.1601625215
C -1.7182271444 -1.9962888369 0.0152208454
C -2.905416575 -2.2627721735 -0.8489696604
C -3.3347497536 -1.1525477782 -1.4661914556
C -2.4353480328 -0.0114763902 -1.1388331005
O -2.4853518918 1.1093228549 -1.595794586
H 1.0876615149 1.8699729825 0.3263064541
H -0.5836533924 2.142053447 -0.2101812431
H -0.2626541721 1.5210712578 1.4170099662
H 0.3710335675 0.0343653078 -1.2041622283
H 0.662791197 -0.6008609283 0.4071642198
H -1.3142481944 -1.0584396879 2.0096409673
H -3.0252533013 -0.8199288984 1.4038330438
H -0.9412329872 -2.7436632889 0.1278385726
H -3.37329736 -3.2394582332 -0.9072563749
H -4.2004762777 -1.0463782959 -2.1061677066"""
)
from rdkit.Chem import rdDetermineBonds
bond_moll = Chem.Mol(raw_mol)
rdDetermineBonds.DetermineBonds(bond_moll,charge=0)
print(Chem.CanonSmiles(Chem.MolToSmiles(bond_moll)))
Which gives the output.
CC[C@]12C[C@H]1C=CC2=O
A solution to #296?
This looks better than the solution I tried last time. Should work
This has been implemented in MoleculeFinder in #392