IFC4.3.x-development icon indicating copy to clipboard operation
IFC4.3.x-development copied to clipboard

IfcXML list-of-values serialization for string derived simple types

Open aothms opened this issue 3 years ago • 1 comments

As you're probably aware in IFC4 we introduced several shortcuts in the serialization of step-xml by means of the tagless list-of-values configuration option.

ifc2x3

	<xs:complexType name="IfcCartesianPoint">
		<xs:complexContent>
			<xs:extension base="ifc:IfcPoint">
				<xs:sequence>
					<xs:element name="Coordinates">
						<xs:complexType>
							<xs:sequence>
								<xs:element ref="ifc:IfcLengthMeasure" maxOccurs="3">

ifc4

	<xs:complexType name="IfcCartesianPoint">
		<xs:complexContent>
			<xs:extension base="ifc:IfcPoint">
				<xs:attribute name="Coordinates" use="optional">
					<xs:simpleType>
						<xs:restriction>
							<xs:simpleType>
								<xs:list itemType="ifc:IfcLengthMeasure"/>

Small difference in XSD, but big difference in XML because the xs:list here is merely a space delimited value of an attribute and xs:sequence would give you a full tag-decorated sequence of child nodes.

I'm somewhat surprised though to see this also applied in string-based attributes such as MiddleNames on IfcPerson.

  <xs:attribute name="MiddleNames" use="optional">
	  <xs:simpleType>
		  <xs:restriction>
			  <xs:simpleType>
				  <xs:list itemType="ifc:IfcLabel"/>
			  </xs:simpleType>
		  </xs:restriction>
	  </xs:simpleType>
  </xs:attribute>

The part 28 text is of course a bit vague on this:

By definition, all representations of the data types BOOLEAN, INTEGER, LOGICAL, NUMBER, and REAL do not contain whitespace, and the XML list form – tokens separated by whitespace – is an unambiguous representation.

NOTE 2 When the base-type of the EXPRESS aggregation data type is STRING or BINARY, or a defined data type whose fundamental type is STRING or BINARY, 8.2.2.2 generally applies, but this subclause applies when specified by Table 3.

So my summary don't use this representation form on strings unless specified in Table 3. The note under Table 3 says:

NOTE 4 As specified in 10.2.8, tagless="true" shall not be specified unless the base-type is one of the following:

— a simple type as defined above, — a STRING data type whose values will not contain whitespace,

I didn't really understand when it is sufficiently established when a STRING data type will not contain whitespace. Maybe whitespace in a middle name is unlikely, but IfcLabel in itself will not prevent this so we end up with an ambiguous serialization.

My proposal. Never use list-of-values on any string-based type. Not even IfcIdentifier.

aothms avatar Mar 26 '22 08:03 aothms

Related to this:

ENTITY IfcSurfaceTexture
    ...
    Parameter : OPTIONAL LIST [1:?] OF IfcIdentifier;
    ...
END_ENTITY;

ENTITY IfcClassification
    ...
    ReferenceTokens : OPTIONAL LIST [1:?] OF IfcIdentifier;
    ...
END_ENTITY;
<xs:attribute name="Parameter" use="optional">
	<xs:simpleType>
		<xs:restriction>
			<xs:simpleType>
				<xs:list itemType="ifc:IfcIdentifier"/>
			</xs:simpleType>
		</xs:restriction>
	</xs:simpleType>
</xs:attribute>

<xs:element name="ReferenceTokens" nillable="true" minOccurs="0">
<xs:complexType>
	<xs:sequence>
		<xs:element ref="ifc:IfcIdentifier-wrapper" maxOccurs="unbounded"/>
	</xs:sequence>
	<xs:attribute ref="ifc:itemType" fixed="ifc:IfcIdentifier-wrapper"/>
	<xs:attribute ref="ifc:cType" fixed="list"/>
	<xs:attribute ref="ifc:arraySize" use="optional"/>
</xs:complexType>
</xs:element>

I have here two Express attributes with an identical type. That end up as different encodings in the IFC4 XSD. Perhaps this is allowed. I didn't find the configuration settings that are used to derive the XSD from Express. But I consider this highly undesirable. Because this creates a strict dependency on the XSD (or configuration) for P28 authoring.

This issue is solved if we follow my proposal to never use list-of-values on any string-based type.

aothms avatar Mar 26 '22 13:03 aothms