dbptk-developer
dbptk-developer copied to clipboard
SIARD validation error on requirement T_6.3-1
Description:
SIARD Version is 2.1, file was produced through SiardGui 2.1.134. Database was MS SQL Server 12.
A SIARD validation with dbptk results in an unexpected error on requirement T_6.3-1 "Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. This restriction is enforced in the definitions of dateType and dateTimeType."
The validation report shows more details:
T_6.3 - Date and timestamp data cells
T_6.3-1: [ERROR] - Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. - Error on content/schema0/table3/table3.xsd restriction not enforced
T_6.3-1: [ERROR] - Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. - Error on content/schema0/table6/table6.xsd restriction not enforced
T_6.3-1: [ERROR] - Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. - Error on content/schema0/table9/table9.xsd restriction not enforced
Date and timestamp data cells [FAILED]
The schemas mentioned define the types as below:
<!-- date type between 0001 and 9999 -->
<xs:simpleType name="dateType">
<xs:restriction base="xs:date">
<xs:minInclusive value="0001-01-01"/>
<xs:maxExclusive value="10000-01-01"/>
<xs:pattern value="\d{4}-\d{2}-\d{2}Z?"/>
</xs:restriction>
</xs:simpleType>
<!-- time type restricted to UTC -->
<xs:simpleType name="timeType">
<xs:restriction base="xs:time">
<xs:pattern value="\d{2}:\d{2}:\d{2}(\.\d+)?Z?"/>
</xs:restriction>
</xs:simpleType>
<!-- dateTime type between 0001 and 9999 restricted to UTC -->
<xs:simpleType name="dateTimeType">
<xs:restriction base="xs:dateTime">
<xs:minInclusive value="0001-01-01T00:00:00.000000000Z"/>
<xs:maxExclusive value="10000-01-01T00:00:00.000000000Z"/>
<xs:pattern value="\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?Z?"/>
</xs:restriction>
</xs:simpleType>
I have questions on that result:
- The restriction seems correct to me. What makes validation fail?
- Why does validation not fail on all the other table schemas? There are 20 tables, all with the same definition. Is there an explanation why it only fails on table3, table6, and table9?
Thanks for helping me out.
It seems to me that the validator is expecting the dateType min and max values to have the GMT timezone defined.
<xs:minInclusive value="0001-01-01Z"/>
<xs:maxExclusive value="10000-01-01Z"/>
https://github.com/keeps/dbptk-developer/blob/fae47ccc31c6a9392910c3769d65e0c1c274d79e/dbptk-modules/dbptk-module-siard/src/main/java/com/databasepreservation/modules/siard/validate/component/tableData/DateAndTimestampDataValidator.java#L38-L39
@luis100 thanks for your quick reply. This seems to explain my first question. Any idea why it fails only on a selection of tables?
This would probably have to be corrected in the SIARD producing software, i.e. SIARD GUI, right?
My comment is only an initial assessment, for a proper analysis of the issue we would need the SIARD that has the issue to check why it fails only in a selection.
In terms of fixing, I am not sure if this is an issue of the SIARD producing software or of the vallidator that is being too strict. We need to go back to the specification to see if timezone should be considered in these cases, but it already seems odd to me that timezone is considered in the dateTimeType restrictions and not on the dateType, as it would affect proper comparison in the same way.
So, I'll leave the issue still open for further discussion. If you can provide the SIARD or a SIARD with mocked data that shows the same behaviour it would help to better assess the issue.
The SIARD is pretty sensitive, so I cannot hand that out. I'll see If there's a way to produce a mockup.