LLM4Chem icon indicating copy to clipboard operation
LLM4Chem copied to clipboard

Query about LLM4Chem.utils.smiles_canonicalization.canonicalize_molecule_smiles

Open stanleyjs opened this issue 1 year ago • 1 comments

Hello,

Thank you for your work on this. I am finding it very useful for my research.

I am curious what you are intending with the repeated to/from RDKit Mol in your canonicalization routine. Could you explain that function, or explain lines 79-83?

Thank you Jay

stanleyjs avatar Aug 06 '24 16:08 stanleyjs

Hi, thanks for your interest in our work.

It is because we found that doing one time cannot ensure the output of a same molecule is exactly the same. We don't know the reason, but it should be the design of RDkit. Empirically, this repetition can solve the problem, though it slows down the speed :)

btyu avatar Aug 12 '24 18:08 btyu

Closing this issue due to no further update. Please feel free to reopen it if needed :)

btyu avatar Sep 20 '24 21:09 btyu