cppagent icon indicating copy to clipboard operation
cppagent copied to clipboard

XML Schema validation with lxml

Open rwuthric opened this issue 1 year ago • 5 comments

We are trying to setup an XML file validation process using the official MTConnect schema. We use for this the python library lxml like so:

from lxml import etree

# Load xml file to validate
tree = etree.parse('mtc_file.xml')

# Load the MTConnect XML schema
with open('MTConnectDevices_2.3_1.0.xsd') as f:
    xmlschema_doc = etree.parse(f)
    xmlschema = etree.XMLSchema(xmlschema_doc)

# Validate the MTConnect XML device model file
if not tree.validate(schema):
    print('The XML device model file does not follow the MTConnect standart.')

We tried using the MTConnect schema MTConnectDevices_2.2.xsd, MTConnectDevices_2.3_1.0.xsd, MTConnectDevices_2.3.xsd and MTConnectDevices_2.3_1.0.xsd and they all ended up with errors when attempting to load them with xmlschema = etree.XMLSchema(xmlschema_doc).

For example for MTConnectDevices_2.2.xsd, lxml claims to find this error in the schema:

lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}any': The attribute 'notNamespace' is not allowed., line 7581

I was wondering if this is an issue of lxml or if something is indeed not 100% correct in MTConnectDevices_2.2.xsd. Does anyone has experience with XML validation with other tools than lxml?

rwuthric avatar Mar 15 '24 20:03 rwuthric

The xsd files without the 1.0 suffix use xml schema 1.1 and enable new features in 1.1. By new I mean 12 years old. Most of the xsd validators don’t support 1.1 yet. That’s why we also generate the 1.0 schemas. W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structuresw3.org

wsobel avatar Mar 15 '24 21:03 wsobel