xmlschema
xmlschema copied to clipboard
Group definition with order "choice" is parsed falsly as complexType "sequence"
Hi Guys,
Thank you for maintaining such a good library. While using it extensively for xsd parsing, I found a bug.
I'm working with ASAM OpenSCENARIO (see ASAM) schema v1.2 which is publicly available (OpenSCENARIO_1.2.zip)
Consider the following complexType
, found in the xsd:
<xsd:complexType name="ParameterValueDistribution">
<xsd:sequence>
<xsd:element name="ScenarioFile" type="File"/>
<xsd:group ref="DistributionDefinition"/>
</xsd:sequence>
</xsd:complexType>
and the group definition:
<xsd:group name="DistributionDefinition">
<xsd:choice>
<xsd:element name="Deterministic" type="Deterministic"/>
<xsd:element name="Stochastic" type="Stochastic"/>
</xsd:choice>
</xsd:group>
If I parse the xsd with xmlschema.XMLSchema(schema_path)
the group is converted falsely to complexType
with order indicator sequence
:
This is obviously false, because only one child element is allowed! Either Deterministic
or Stochastic
but not both!
Hi, I didn't find cited global type and group in linked OpenScenario.xsd so I've created a sample schema and prepared a test for it:
def test_model_group_composition_in_a_sequence__issue_384(self):
schema = XMLSchema(dedent("""\
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="root" type="type1"/>
<xs:complexType name="type1">
<xs:sequence>
<xs:element name="elem1" type="xs:string"/>
<xs:group ref="group1"/>
</xs:sequence>
</xs:complexType>
<xs:group name="group1">
<xs:choice>
<xs:element name="elem2" type="xs:string"/>
<xs:element name="elem3" type="xs:string"/>
</xs:choice>
</xs:group>
</xs:schema>"""))
xsd_type = schema.types['type1']
self.assertIsInstance(xsd_type.content, XsdGroup)
self.assertEqual(xsd_type.content.model, 'sequence')
self.assertEqual(len(xsd_type.content), 2)
self.assertEqual(xsd_type.content[0].name, 'elem1')
self.assertIsInstance(xsd_type.content[0], XsdElement)
self.assertIsInstance(xsd_type.content[1], XsdGroup)
self.assertEqual(xsd_type.content[1].model, 'choice')
xsd_group = schema.groups['group1']
self.assertEqual(xsd_group.model, 'choice')
self.assertIs(xsd_type.content[1].ref, xsd_group)
self.assertEqual(len(xsd_group), 2)
self.assertEqual(xsd_group[0].name, 'elem2')
self.assertIsInstance(xsd_group[0], XsdElement)
self.assertEqual(xsd_group[1].name, 'elem3')
self.assertIsInstance(xsd_group[1], XsdElement)
self.assertTrue(schema.is_valid('<root><elem1>a</elem1><elem2>b</elem2></root>'))
self.assertTrue(schema.is_valid('<root><elem1>a</elem1><elem3>c</elem3></root>'))
self.assertFalse(schema.is_valid('<root><elem1>a</elem1></root>'))
self.assertFalse(schema.is_valid('<root><elem2>b</elem2></root>'))
self.assertFalse(schema.is_valid('<root><elem3>c</elem3></root>'))
self.assertFalse(schema.is_valid(
'<root><elem1>a</elem1><elem2>b</elem2><elem3>c</elem3></root>'
))
self.assertFalse(schema.is_valid(
'<root><elem1>a</elem1><elem3>c</elem3><elem2>b</elem2></root>'
))
According to the test results the model groups composition seem to be correct. The model of the content of "type1" complexType is 'sequence' and the model of group 'group1' is 'choice'.
In the first image you posted a SchemaElementNode
. This is the XPath point of view of the schema, not used for validation but only for XPath selection. So schema_node.children
contains all the possible children, not necessarily a valid sequence of children.
Thank you
Hi @brunato,
Thank you for checking it and guiding me in the right direction. Indeed I had to switch to the content
of the XsdElement
because this is the only object where the proper order indicator is stored. So it's definitely not a bug.
I have to iterate over the whole schema recursively and select only one or all the children depending on the order indicator (sequence, choice, all
). After spending 3 days, I was able to implement it using the content
. However, there are two issues I would like to clarify.
Issue1
I uploaded the OpenSCENARIO schema, so I can show you the specific cases.
import xmlschema
schema = xmlschema.XMLSchema(path_xsd)
rs = schema.findall(path="//ParameterDeclarations")
It gives me the following list:
/{http://www.w3.org/2001/XMLSchema}schema/OpenScenario/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Route/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Trajectory/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Vehicle/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Pedestrian/ParameterDeclarationss
...
But not this one:
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver/ParameterDeclarations/ParameterDeclaration
What point am I missing here? I thought //ParameterDeclarations
searches everywhere.
Issue2
Consider the XsdElement with the following path: /OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver
schema = xmlschema.XMLSchema("resources/schema/open-scenario/OpenSCENARIO_1.2.xsd")
rs = schema.findall("//Storyboard/Story/Act/ManeuverGroup")
man_group = rs[0]
print(f"xpath: {man_group.xpath_node.path}")
maneuver = man_group[-1]
print(f"xpath: {maneuver.xpath_node.path}")
for child in maneuver:
print(f"xpath: {child.xpath_node.path}")
Output
xpath: /{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup
xpath: /{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver
xpath: /{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Maneuver/ParameterDeclarations
xpath: /{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Maneuver/Event
Why the children of maneuver
have a different xpath? I would have expected
/OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver/ParameterDeclarations
/OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver/Event
Hi @brunato,
Any news or explanation, why the two issues exist?
About the issue 1: each distinct element that is found by the path expression a child in a content of a global xs:complexType
definition (except in one case), so has no ancestors:
>>> for e in set(schema.findall(path="//ParameterDeclarations")):
... print(e.findall('ancestor-or-self::*'))
...
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
examining the XSD ancestors:
>>> for e in set(schema.findall(path="//ParameterDeclarations")):
... print(e.parent, e.parent.parent)
...
Xsd11Group(model='all', occurs=[1, 1]) Xsd11ComplexType(name='Environment')
Xsd11Group(model='sequence', occurs=[1, 1]) Xsd11ComplexType(name='Maneuver')
Xsd11Group(model='all', occurs=[1, 1]) Xsd11ComplexType(name='Vehicle')
Xsd11Group(model='all', occurs=[1, 1]) Xsd11ComplexType(name='MiscObject')
Xsd11Group(model='all', occurs=[1, 1]) Xsd11ComplexType(name='Controller')
Xsd11Group(model='sequence', occurs=[1, 1]) Xsd11ComplexType(name='Story')
Xsd11Group(name='ScenarioDefinition', model='sequence', occurs=[1, 1]) None
Xsd11Group(model='all', occurs=[1, 1]) Xsd11ComplexType(name='Pedestrian')
Xsd11Group(model='sequence', occurs=[1, 1]) Xsd11ComplexType(name='Route')
Xsd11Group(model='sequence', occurs=[1, 1]) Xsd11ComplexType(name='Trajectory')
These ten are the only elements in the schema that have name='ParameterDeclarations'.
The paths that you report are the paths of the XPath node tree:
>>> for e in set(schema.findall(path="//ParameterDeclarations")):
... print(e.xpath_node.path)
...
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Environment/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Maneuver/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Vehicle/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/MiscObject/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Controller/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Storyboard/Story/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenScenario/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Pedestrian/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Route/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Trajectory/ParameterDeclarations
But the paths of the node tree are not necessarily the same of the XSD elements in the schema, the node tree is substantially an extension of the schema structure (particularly in this case where the elements are all declared in global complex types or groups).
A schema is a complex graph with references and an expansion with root descendant operator '//' might not follow the same path that you expected in the XML document. Also the '//' operator cannot returns the same node twice, repetitions are discarded.
The Issue 2 is a variant, same explanation of the unexpected results.
Hi @brunato,
Thank you for the explanation. I understand that the XSD tree in the schema is different from the Xpath node tree. I ended up implementing my own function that finds all possible xpaths, where a given XSDElement name shows up.
For example, for Condition
it returns (namespace URLs removed)
/OpenSCENARIO/Storyboard/Story/Act/StartTrigger/ConditionGroup/Condition
/OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver/Event/StartTrigger/ConditionGroup/Condition
/OpenSCENARIO/Storyboard/Story/Act/StopTrigger/ConditionGroup/Condition
/OpenSCENARIO/Storyboard/StopTrigger/ConditionGroup/Condition