cdk icon indicating copy to clipboard operation
cdk copied to clipboard

stereo information unavailable when generating SMILES from IAtomcontainer

Open biotech7 opened this issue 2 years ago • 10 comments

Based on this link https://egonw.github.io/cdkbook/stereo.html#sec:stereo:bond, a simple molecule was constructed. but stereo information unvalaible when converting molecule to smiles. code to reproduce.

IAtomContainer isomer = new AtomContainer();
        isomer.addAtom(new Atom("C"));
        isomer.addAtom(new Atom("Cl"));
        isomer.addAtom(new Atom("Br"));
        isomer.addAtom(new Atom("F"));
        isomer.addAtom(new Atom("I"));
        isomer.addBond(0, 1, IBond.Order.SINGLE);
        isomer.addBond(0, 2, IBond.Order.SINGLE);
        isomer.getBond(1).setStereo(IBond.Stereo.UP);
        isomer.addBond(0, 3, IBond.Order.SINGLE);
        isomer.getBond(2).setStereo(IBond.Stereo.UP);
        isomer.addBond(0, 4, IBond.Order.SINGLE);

        AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(isomer);
        CDKHydrogenAdder adder = CDKHydrogenAdder.getInstance(SilentChemObjectBuilder.getInstance());
        adder.addImplicitHydrogens(isomer);

        SmilesGenerator generator = new SmilesGenerator(SmiFlavor.UniversalSmiles);
        System.out.println(generator.create(isomer));

output: C(Br)(Cl)(F)I

Another trial was conducted as following procedure:stereo-included smiles was introduced and converted to IAtomcontainer, finally "reversed" to smiles string. stereo info showed up. code to reproduce:

        String smi = "F[C@](Cl)(Br)I";
        IChemObjectBuilder builder = new SilentChemObjectBuilder();
        SmilesParser parser = new SmilesParser(builder);
        isomer = parser.parseSmiles(smi);
        System.out.println(generator.create(isomer));

output: C@@(Cl)(F)I

why the stereo info omitted in the first example?

env. information: OS: windows 11 CDK: 2.9 IDE: IDEA 2023.2

biotech7 avatar Sep 10 '23 07:09 biotech7

@johnmay, I guess .setStereo(IBond.Stereo.UP); needs to be converted to another stereo model, which is used by the SmilesGenerator, right? Because SmiFlavor.UniversalSmiles does include stereochemistry, I just checked.

egonw avatar Sep 10 '23 07:09 egonw

Well there are no coordinates so up/down bonds are meaningless, I mean I'm not sure how you would expect it to guess what you meant by giving it two UP bonds?

IAtomContainer isomer = new AtomContainer();
isomer.addAtom(new Atom("C"));
isomer.addAtom(new Atom("Cl"));
isomer.addAtom(new Atom("Br"));
isomer.addAtom(new Atom("F"));
isomer.addAtom(new Atom("I"));
isomer.addBond(0, 1, IBond.Order.SINGLE);
isomer.addBond(0, 2, IBond.Order.SINGLE);
isomer.addBond(0, 3, IBond.Order.SINGLE);
isomer.addBond(0, 4, IBond.Order.SINGLE);
        
// if you have an implicit hydrogen you need to put the 'focus' in the list where you want it
IAtom chiralAtom = isomer.getAtom(0); // the 'focus'
isomer.addStereoElement(
  new TetrahedralChirality(
    chiralAtom,
    isomer.getConnectedAtomsList(chiralAtom).toArray(new IAtom[4]),
    ITetrahedralChirality.Stereo.CLOCKWISE));

Oh you should also use the builder otherwise you'll have a log of pain in future when APIs change, the new Atoms are OK but the builder.newAtomContainer() gives you back an IAtomContainer (which is actually an AtomContainer2). The constructors on the Atom/AtomContainer should be protected to avoid you usage but unfortunately the builder was added later. The AtomContainer2 is expected by a lot of code now and you will get some warnings telling you to use it.

IChemObjectBuilder builder = SilentChemObjectBuilder.getInstance();
IAtomContainer isomer = builder.newAtomContainer();
isomer.addAtom(new Atom("C")); // builder.newAtom(IAtom.class, "C");
isomer.addAtom(new Atom("Cl"));
isomer.addAtom(new Atom("Br"));
isomer.addAtom(new Atom("F"));
isomer.addAtom(new Atom("I"));
isomer.addBond(0, 1, IBond.Order.SINGLE);
isomer.addBond(0, 2, IBond.Order.SINGLE);
isomer.addBond(0, 3, IBond.Order.SINGLE);
isomer.addBond(0, 4, IBond.Order.SINGLE);

// if you have an implicit hydrogen
IAtom chiralAtom = isomer.getAtom(0); // the 'focus'
isomer.addStereoElement(
  new TetrahedralChirality(
    chiralAtom,
    isomer.getConnectedAtomsList(chiralAtom).toArray(new IAtom[4]),
    ITetrahedralChirality.Stereo.CLOCKWISE));

johnmay avatar Sep 11 '23 11:09 johnmay

Oh on the main branch you can actually do this which is much cleaner:

IChemObjectBuilder builder = SilentChemObjectBuilder.getInstance();
IAtomContainer isomer = builder.newAtomContainer();
isomer.newAtom(IAtom.C, 0);
isomer.newAtom(IAtom.Cl, 0);
isomer.newAtom(IAtom.Br, 0);
isomer.newAtom(IAtom.I, 0);
isomer.newAtom(IAtom.F, 0);
isomer.addBond(0, 1, IBond.Order.SINGLE);
isomer.addBond(0, 2, IBond.Order.SINGLE);
isomer.addBond(0, 3, IBond.Order.SINGLE);
isomer.addBond(0, 4, IBond.Order.SINGLE);

johnmay avatar Sep 11 '23 11:09 johnmay

Well there are no coordinates so up/down bonds are meaningless,

Oh, duh :/

egonw avatar Sep 11 '23 19:09 egonw

@egonw @johnmay But in most cases, many "focuses" have no saturated bonds during constructing molecules until final implicit hydrogens addition. so could it be possible to set stereochemistry properties of bond/atom in a molecule rigidly before constructing the entire molecule? This idea originated from a dilemma of parsing cdxml format strings, such as <b Z="102" B="13" E="2" Order="1" Display="WedgeBegin"/>, which contains few bond/atom info for setting up stereochemistry properties effectively. and occasionally resulting in NullPointerException: A carrier was undefined when creating new TetrahedralChirality().

biotech7 avatar Sep 12 '23 13:09 biotech7

NullPointerException: A carrier was undefined

Got a link to your code? I feel like with have https://en.wikipedia.org/wiki/XY_problem.

But in most cases, many "focuses" have no saturated bonds during constructing molecules until final implicit hydrogens addition.

From ChemDraw it's only the carbons atoms which don't have a hydrogen count defined. You can handle those much more efficiently than using the CDK's atom typer/hydrogen added. Generally it's preferred if the I/O does this since the valence model depends on the format. For example coming from MOLfile vs SMILES we use a different valence model.

Anyways that is some what redundant. Your error is because you MUST have 4 neighbours for tetrahedral stereochemistry, if there is an implicit hydrogen or lone pair you can set it with. You don't actually need the implicit hydrogen count to know an atom with a wedge only has 3 neighbours.

From CDXML if you have coordinates you can just treat it like a MOLfile (see MDLV2000Reader) and call the StereoElementFactory to assigned stereochemistry from 2D+wedges/3D coordinates. I actually think the more robust way of doing it though is ignoring the wedges an using the Geometry/BondOrdering attribute:

<n
 id="1055"
 p="1157.01 394.05"
 Z="1055"
 Geometry="Tetrahedral"
 AS="R"
 BondOrdering="1063 1064 0 1066"
/>

That's almost a one-to-one translation to the CDK data structure but with bonds rather than atoms. Note the 0 is where the implicit part goes.

johnmay avatar Sep 12 '23 13:09 johnmay

I might need to reconsider my strategy for effective stereo-related molecular building. But fewer choices were found based on JAVA development env..

biotech7 avatar Sep 13 '23 05:09 biotech7

I might need to reconsider my strategy for effective stereo-related molecular building. But fewer choices were found based on JAVA development env..

?

You do may me think maybe I should ad CDX/CDXML support in CDK. I've done a fair amount of work (in Java) on it for work but more the aspect of cleaning stuff up before getting the chemistry out. Extracting the chemical graph to SMILES etc is quite simple.

https://nextmovesoftware.com/posters/May_SketchySketches_Sheffield_201607.pdf https://nextmovesoftware.com/posters/Mayfield_SketchySketchesII_Sheffield_202306.pdf

johnmay avatar Sep 13 '23 06:09 johnmay

The greater the ability, the heavier the task. thanks for your best-practice documents, enriching me much more!

biotech7 avatar Sep 14 '23 02:09 biotech7

OK to close?

johnmay avatar Mar 09 '24 08:03 johnmay