xmlschema icon indicating copy to clipboard operation
xmlschema copied to clipboard

Group definition with order "choice" is parsed falsly as complexType "sequence"

Open ps-luxoft opened this issue 1 year ago • 4 comments

Hi Guys,

Thank you for maintaining such a good library. While using it extensively for xsd parsing, I found a bug.

I'm working with ASAM OpenSCENARIO (see ASAM) schema v1.2 which is publicly available (OpenSCENARIO_1.2.zip)

Consider the following complexType, found in the xsd:

<xsd:complexType name="ParameterValueDistribution">
    <xsd:sequence>
	  <xsd:element name="ScenarioFile" type="File"/>
	  <xsd:group ref="DistributionDefinition"/>
    </xsd:sequence>
</xsd:complexType>

and the group definition:

<xsd:group name="DistributionDefinition">
    <xsd:choice>
	<xsd:element name="Deterministic" type="Deterministic"/>
	<xsd:element name="Stochastic" type="Stochastic"/>
    </xsd:choice>
</xsd:group>

If I parse the xsd with xmlschema.XMLSchema(schema_path) the group is converted falsely to complexType with order indicator sequence:

SchemaElementNode

xsd_type

This is obviously false, because only one child element is allowed! Either Deterministic or Stochastic but not both!

ps-luxoft avatar Jan 15 '24 12:01 ps-luxoft

Hi, I didn't find cited global type and group in linked OpenScenario.xsd so I've created a sample schema and prepared a test for it:

    def test_model_group_composition_in_a_sequence__issue_384(self):
        schema = XMLSchema(dedent("""\
            <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
                <xs:element name="root" type="type1"/>
                <xs:complexType name="type1">
                    <xs:sequence>
                      <xs:element name="elem1" type="xs:string"/>
                      <xs:group ref="group1"/>
                    </xs:sequence>
                </xs:complexType>
                <xs:group name="group1">
                    <xs:choice>
                      <xs:element name="elem2" type="xs:string"/>
                      <xs:element name="elem3" type="xs:string"/>
                    </xs:choice>
                </xs:group>
            </xs:schema>"""))

        xsd_type = schema.types['type1']
        self.assertIsInstance(xsd_type.content, XsdGroup)
        self.assertEqual(xsd_type.content.model, 'sequence')
        self.assertEqual(len(xsd_type.content), 2)
        self.assertEqual(xsd_type.content[0].name, 'elem1')
        self.assertIsInstance(xsd_type.content[0], XsdElement)
        self.assertIsInstance(xsd_type.content[1], XsdGroup)
        self.assertEqual(xsd_type.content[1].model, 'choice')

        xsd_group = schema.groups['group1']
        self.assertEqual(xsd_group.model, 'choice')
        self.assertIs(xsd_type.content[1].ref, xsd_group)
        self.assertEqual(len(xsd_group), 2)
        self.assertEqual(xsd_group[0].name, 'elem2')
        self.assertIsInstance(xsd_group[0], XsdElement)
        self.assertEqual(xsd_group[1].name, 'elem3')
        self.assertIsInstance(xsd_group[1], XsdElement)

        self.assertTrue(schema.is_valid('<root><elem1>a</elem1><elem2>b</elem2></root>'))
        self.assertTrue(schema.is_valid('<root><elem1>a</elem1><elem3>c</elem3></root>'))

        self.assertFalse(schema.is_valid('<root><elem1>a</elem1></root>'))
        self.assertFalse(schema.is_valid('<root><elem2>b</elem2></root>'))
        self.assertFalse(schema.is_valid('<root><elem3>c</elem3></root>'))

        self.assertFalse(schema.is_valid(
            '<root><elem1>a</elem1><elem2>b</elem2><elem3>c</elem3></root>'
        ))
        self.assertFalse(schema.is_valid(
            '<root><elem1>a</elem1><elem3>c</elem3><elem2>b</elem2></root>'
        ))

According to the test results the model groups composition seem to be correct. The model of the content of "type1" complexType is 'sequence' and the model of group 'group1' is 'choice'.

In the first image you posted a SchemaElementNode. This is the XPath point of view of the schema, not used for validation but only for XPath selection. So schema_node.children contains all the possible children, not necessarily a valid sequence of children.

Thank you

brunato avatar Jan 16 '24 12:01 brunato

Hi @brunato,

Thank you for checking it and guiding me in the right direction. Indeed I had to switch to the content of the XsdElement because this is the only object where the proper order indicator is stored. So it's definitely not a bug.

I have to iterate over the whole schema recursively and select only one or all the children depending on the order indicator (sequence, choice, all). After spending 3 days, I was able to implement it using the content. However, there are two issues I would like to clarify.

ps-luxoft avatar Jan 22 '24 12:01 ps-luxoft

Issue1

I uploaded the OpenSCENARIO schema, so I can show you the specific cases.

import xmlschema

schema = xmlschema.XMLSchema(path_xsd)
rs = schema.findall(path="//ParameterDeclarations")

It gives me the following list:

/{http://www.w3.org/2001/XMLSchema}schema/OpenScenario/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Route/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Trajectory/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Vehicle/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Pedestrian/ParameterDeclarationss
...

But not this one:

/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver/ParameterDeclarations/ParameterDeclaration

What point am I missing here? I thought //ParameterDeclarations searches everywhere.

ps-luxoft avatar Jan 22 '24 12:01 ps-luxoft

Issue2 Consider the XsdElement with the following path: /OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver

schema = xmlschema.XMLSchema("resources/schema/open-scenario/OpenSCENARIO_1.2.xsd")
rs = schema.findall("//Storyboard/Story/Act/ManeuverGroup")
man_group = rs[0]
print(f"xpath: {man_group.xpath_node.path}")
maneuver = man_group[-1]
print(f"xpath: {maneuver.xpath_node.path}")
for child in maneuver:
    print(f"xpath: {child.xpath_node.path}")

Output

xpath: /{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup
xpath: /{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver
xpath: /{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Maneuver/ParameterDeclarations
xpath: /{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Maneuver/Event

Why the children of maneuver have a different xpath? I would have expected /OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver/ParameterDeclarations /OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver/Event

ps-luxoft avatar Jan 22 '24 13:01 ps-luxoft

Hi @brunato,

Any news or explanation, why the two issues exist?

ps-luxoft avatar Mar 23 '24 20:03 ps-luxoft

About the issue 1: each distinct element that is found by the path expression a child in a content of a global xs:complexType definition (except in one case), so has no ancestors:

>>> for e in set(schema.findall(path="//ParameterDeclarations")):
...    print(e.findall('ancestor-or-self::*'))
... 
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]
[Xsd11Element(name='ParameterDeclarations', occurs=[0, 1])]

examining the XSD ancestors:

>>> for e in set(schema.findall(path="//ParameterDeclarations")):
...    print(e.parent, e.parent.parent)
... 
Xsd11Group(model='all', occurs=[1, 1]) Xsd11ComplexType(name='Environment')
Xsd11Group(model='sequence', occurs=[1, 1]) Xsd11ComplexType(name='Maneuver')
Xsd11Group(model='all', occurs=[1, 1]) Xsd11ComplexType(name='Vehicle')
Xsd11Group(model='all', occurs=[1, 1]) Xsd11ComplexType(name='MiscObject')
Xsd11Group(model='all', occurs=[1, 1]) Xsd11ComplexType(name='Controller')
Xsd11Group(model='sequence', occurs=[1, 1]) Xsd11ComplexType(name='Story')
Xsd11Group(name='ScenarioDefinition', model='sequence', occurs=[1, 1]) None
Xsd11Group(model='all', occurs=[1, 1]) Xsd11ComplexType(name='Pedestrian')
Xsd11Group(model='sequence', occurs=[1, 1]) Xsd11ComplexType(name='Route')
Xsd11Group(model='sequence', occurs=[1, 1]) Xsd11ComplexType(name='Trajectory')

These ten are the only elements in the schema that have name='ParameterDeclarations'.

The paths that you report are the paths of the XPath node tree:

>>> for e in set(schema.findall(path="//ParameterDeclarations")):
...    print(e.xpath_node.path)
... 
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Environment/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Maneuver/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Vehicle/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/MiscObject/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Controller/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Storyboard/Story/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenScenario/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Pedestrian/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Route/ParameterDeclarations
/{http://www.w3.org/2001/XMLSchema}schema/OpenSCENARIO/Catalog/Trajectory/ParameterDeclarations

But the paths of the node tree are not necessarily the same of the XSD elements in the schema, the node tree is substantially an extension of the schema structure (particularly in this case where the elements are all declared in global complex types or groups).

A schema is a complex graph with references and an expansion with root descendant operator '//' might not follow the same path that you expected in the XML document. Also the '//' operator cannot returns the same node twice, repetitions are discarded.

The Issue 2 is a variant, same explanation of the unexpected results.

brunato avatar Mar 24 '24 18:03 brunato

Hi @brunato,

Thank you for the explanation. I understand that the XSD tree in the schema is different from the Xpath node tree. I ended up implementing my own function that finds all possible xpaths, where a given XSDElement name shows up.

For example, for Condition it returns (namespace URLs removed)

/OpenSCENARIO/Storyboard/Story/Act/StartTrigger/ConditionGroup/Condition
/OpenSCENARIO/Storyboard/Story/Act/ManeuverGroup/Maneuver/Event/StartTrigger/ConditionGroup/Condition
/OpenSCENARIO/Storyboard/Story/Act/StopTrigger/ConditionGroup/Condition
/OpenSCENARIO/Storyboard/StopTrigger/ConditionGroup/Condition

ps-luxoft avatar Apr 08 '24 11:04 ps-luxoft