Query about LLM4Chem.utils.smiles_canonicalization.canonicalize_molecule_smiles
Hello,
Thank you for your work on this. I am finding it very useful for my research.
I am curious what you are intending with the repeated to/from RDKit Mol in your canonicalization routine. Could you explain that function, or explain lines 79-83?
Thank you Jay
Hi, thanks for your interest in our work.
It is because we found that doing one time cannot ensure the output of a same molecule is exactly the same. We don't know the reason, but it should be the design of RDkit. Empirically, this repetition can solve the problem, though it slows down the speed :)
Closing this issue due to no further update. Please feel free to reopen it if needed :)