PubChemPy icon indicating copy to clipboard operation
PubChemPy copied to clipboard

missed/previously unspotted opportunity to discern connectivity/absolute SMILES

Open nbehrnd opened this issue 5 months ago • 0 comments

@mcs07 While experimenting in a dev of my fork of PubChemPy, I noticed test_compound.py defined as c1 reference compound benzene (CID241). Later (the same test file), PubChem is queried about what the database considered as CanonicalSMILES (since July 2025 ConnectivitySMILES), or IsomericSMILES (now AbsoluteSMILES). While I agree with the logic of the tests, to me, using benzene here is missed opportunity because a compound like (S)-alanine could display a prominent difference. However, the substitution here implies additional changes in the same file, too (7af97e0). By recollection, a few other tests did not discern (yet) between mere connectivity, and 3D/CIP SMILES. Other times, where maybe SMILES to not impose too early a constraint (which, in case querying for an absolute SMILES string when the database only holds the field for a connectivity SMILES) to then yield nothing.

So far, the objective in my fork has been to use the new (July 2025) SMILES keywords while retaining some backward compatibility of pubchempy if used as a module for legacy code .and. still be able to generate a .whl which could be used with Python 3.10...3.12 in Linux/Windows/Mac. While doing so I saw reference c2 in test_compound.py is the acetate ion (CID175), though test_single_atom() reads like maybe it was designed with chloride (CID312) in mind. Not on my list now (and PubChem scheduled to close over the weekend), I didn't edit this one.

nbehrnd avatar Jul 25 '25 20:07 nbehrnd