aas-specs icon indicating copy to clipboard operation
aas-specs copied to clipboard

Update the schemata to v3.1 of the specification

Open s-heppner opened this issue 1 year ago • 3 comments

These schemata are not tested yet and are meant for reference only. They are not the offical final version.

s-heppner avatar Feb 21 '24 13:02 s-heppner

I'll try to have a look at this tonight.

mristin avatar Feb 27 '24 10:02 mristin

@s-heppner I think I reproduced the issue in a small schema:

<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="something">
    <xs:simpleType>
          <xs:restriction base="xs:string">
               <xs:pattern value="[ -\ud7ff\uf900-\ufdcf\ufdf0-\uffef\U00010000-\U0001fffd])"/>
          </xs:restriction>
    </xs:simpleType>
</xs:element>
</xs:schema>

This schema does not validate with https://www.liquid-technologies.com/online-xsd-validator.

I'm looking into how to represent character code points above Basic Multilingual Plane (https://en.wikipedia.org/wiki/Plane_(Unicode)).

mristin avatar Feb 28 '24 17:02 mristin

After some search, it seems that this is highly dependent on the XSD validator used. We currently use a validator based on C# in the continuous integration, so the patterns are directly forwarded to the C# regex engine.

As C# supports only UTF-16, the code points above BMP can not be represented. The solution would be to expand the pattern so that only UTF-16 ranges are used. This is closely related to #362. Whatever the solution in #362, the same patch needs to be applied here as well.

We haven't noticed this problem thus far as no code points above BMP have appeared in XSD.

Just for future reference: a possible solution is to patch aas-core-codegen to fix patterns in XSD so that they only operate on UTF-16 characters and ranges.

mristin avatar Feb 28 '24 18:02 mristin