NCDK icon indicating copy to clipboard operation
NCDK copied to clipboard

Cml generated does not match CML Schema

Open MikeWilliams-UK opened this issue 4 years ago • 5 comments

The CML Schema requires that the file has a <cml> root element. See Minimal Molecule example.

using (var file = new FileStream("./data/output.xml", FileMode.Create, FileAccess.Write))
{
    using (var writer = new CMLWriter(file))
    {
        writer.Write(layedOutMol);
    }
}

file ./data/output.xml is missing cml root element.

MikeWilliams-UK avatar May 13 '20 13:05 MikeWilliams-UK

So CML does not require cml is the root, Chemistry Development Kit (and NCDK) does not define xml namespace in molecule tag. In fact, PerkinElmer's ChemDraw 19.1 (newest verion) also goes the same way like the following.

CML generated by ChemDraw 19.1.

<?xml version="1.0"?>
<molecule xmlns="http://www.xml-cml.org/schema">
<atomArray>
<atom elementType="C" id="a2" x2="5.34607" y2="-4.77841"/>
<atom elementType="C" id="a4" x2="5.34607" y2="-6.30141"/>
<atom elementType="C" id="a6" x2="6.66503" y2="-7.06291"/>
<atom elementType="C" id="a8" x2="7.98398" y2="-6.30141"/>
<atom elementType="C" id="a10" x2="7.98398" y2="-4.77841"/>
<atom elementType="C" id="a12" x2="6.66503" y2="-4.01691"/>
</atomArray>
<bondArray>
<bond atomRefs2="a2 a4" id="b14" order="2"/>
<bond atomRefs2="a4 a6" id="b15" order="1"/>
<bond atomRefs2="a6 a8" id="b16" order="2"/>
<bond atomRefs2="a8 a10" id="b17" order="1"/>
<bond atomRefs2="a10 a12" id="b18" order="2"/>
<bond atomRefs2="a12 a2" id="b19" order="1"/>
</bondArray>
</molecule>

k-ujihara avatar Jul 11 '20 14:07 k-ujihara

@kazuyaujihara I have consulted with Peter Murray-Rust the originator of the cml standard and he has confirmed that if the document is stand alone then cml should be the root element. Like this.

<?xml version="1.0" encoding="UTF-8"?>
<cml>
    <molecule id="m1">
        <atomArray>
            <atom id="a1" elementType="H" />
        </atomArray>
    </molecule>
</cml>

A cml fragment can also be a child element of another document, therefore the following is also valid.

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <cml>
    <molecule id="m1">
        <atomArray>
            <atom id="a1" elementType="H" />
        </atomArray>
    </molecule>
  </cml>
</root>

Therefore neither NCDK, nor ChemDraw are compliant to the standard.

Please fix NCDK.

MikeWilliams-UK avatar Jul 11 '20 22:07 MikeWilliams-UK

I will consider about it. Anyway, there are several CML examples, which root element is not cml in http://www.xml-cml.org/examples/schema3/molecular/ like http://www.xml-cml.org/examples/schema3/molecular/minimal-molecule-3.html.

<?xml version="1.0" encoding="UTF-8"?>
<molecule xmlns="http://www.xml-cml.org/schema" xmlns:conventions="http://www.xml-cml.org/convention/"
          convention="conventions:molecular" id="m1">
</molecule>

A quick glance at http://www.xml-cml.org/schema/schema3/ does not seem to prevent a molecule tag to be a root.

k-ujihara avatar Jul 12 '20 12:07 k-ujihara

OpenBabel also uses a molecule tag as root.

k-ujihara avatar Jul 13 '20 03:07 k-ujihara

I think it's a bit unfair to say that the justification for not including the cml element as the parent of the element as only 6/38 do not have this, hence it may be a mistake in the data which underpins the web site http://www.xml-cml.org/examples/schema3/molecular/

MikeWilliams-UK avatar Jul 13 '20 11:07 MikeWilliams-UK