dbptk-developer icon indicating copy to clipboard operation
dbptk-developer copied to clipboard

SIARD validation error on requirement T_6.3-1

Open oschihin opened this issue 3 years ago • 4 comments

Description:

SIARD Version is 2.1, file was produced through SiardGui 2.1.134. Database was MS SQL Server 12.

A SIARD validation with dbptk results in an unexpected error on requirement T_6.3-1 "Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. This restriction is enforced in the definitions of dateType and dateTimeType."

The validation report shows more details:

T_6.3 - Date and timestamp data cells
	T_6.3-1: [ERROR] - Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. - Error on content/schema0/table3/table3.xsd restriction not enforced
	T_6.3-1: [ERROR] - Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. - Error on content/schema0/table6/table6.xsd restriction not enforced
	T_6.3-1: [ERROR] - Dates and timestamps must be restricted to the years 0001-9999 according to the SQL:2008 specification. - Error on content/schema0/table9/table9.xsd restriction not enforced
Date and timestamp data cells [FAILED]

The schemas mentioned define the types as below:

  <!-- date type between 0001 and 9999 -->
  <xs:simpleType name="dateType">
    <xs:restriction base="xs:date">
      <xs:minInclusive value="0001-01-01"/>
      <xs:maxExclusive value="10000-01-01"/>
      <xs:pattern value="\d{4}-\d{2}-\d{2}Z?"/>
    </xs:restriction>
  </xs:simpleType>

  <!-- time type restricted to UTC -->
  <xs:simpleType name="timeType">
    <xs:restriction base="xs:time">
      <xs:pattern value="\d{2}:\d{2}:\d{2}(\.\d+)?Z?"/>
    </xs:restriction>
  </xs:simpleType>

  <!-- dateTime type between 0001 and 9999 restricted to UTC -->
  <xs:simpleType name="dateTimeType">
    <xs:restriction base="xs:dateTime">
      <xs:minInclusive value="0001-01-01T00:00:00.000000000Z"/>
      <xs:maxExclusive value="10000-01-01T00:00:00.000000000Z"/>
      <xs:pattern value="\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?Z?"/>
    </xs:restriction>
  </xs:simpleType>

I have questions on that result:

  • The restriction seems correct to me. What makes validation fail?
  • Why does validation not fail on all the other table schemas? There are 20 tables, all with the same definition. Is there an explanation why it only fails on table3, table6, and table9?

Thanks for helping me out.

oschihin avatar Feb 23 '22 09:02 oschihin

It seems to me that the validator is expecting the dateType min and max values to have the GMT timezone defined.

      <xs:minInclusive value="0001-01-01Z"/>
      <xs:maxExclusive value="10000-01-01Z"/>

https://github.com/keeps/dbptk-developer/blob/fae47ccc31c6a9392910c3769d65e0c1c274d79e/dbptk-modules/dbptk-module-siard/src/main/java/com/databasepreservation/modules/siard/validate/component/tableData/DateAndTimestampDataValidator.java#L38-L39

luis100 avatar Feb 23 '22 09:02 luis100

@luis100 thanks for your quick reply. This seems to explain my first question. Any idea why it fails only on a selection of tables?

This would probably have to be corrected in the SIARD producing software, i.e. SIARD GUI, right?

oschihin avatar Feb 23 '22 09:02 oschihin

My comment is only an initial assessment, for a proper analysis of the issue we would need the SIARD that has the issue to check why it fails only in a selection.

In terms of fixing, I am not sure if this is an issue of the SIARD producing software or of the vallidator that is being too strict. We need to go back to the specification to see if timezone should be considered in these cases, but it already seems odd to me that timezone is considered in the dateTimeType restrictions and not on the dateType, as it would affect proper comparison in the same way.

So, I'll leave the issue still open for further discussion. If you can provide the SIARD or a SIARD with mocked data that shows the same behaviour it would help to better assess the issue.

luis100 avatar Feb 23 '22 09:02 luis100

The SIARD is pretty sensitive, so I cannot hand that out. I'll see If there's a way to produce a mockup.

oschihin avatar Feb 23 '22 10:02 oschihin