bdq icon indicating copy to clipboard operation
bdq copied to clipboard

TG2-VALIDATION_DAY_INRANGE

Open pzermoglio opened this issue 7 years ago • 48 comments

TestField Value
GUID 8d787cb5-73e2-4c39-9cd1-67c7361dc02e
Label VALIDATION_DAY_INRANGE
Description Is the value of dwc:day interpretable as a valid integer between 1 and 28 inclusive or 29, 30 or 31 given the relative month and year?
TestType Validation
Darwin Core Class dwc:Event
Information Elements ActedUpon dwc:day
dwc:month
dwc:year
Information Elements Consulted
Expected Response INTERNAL_PREREQUISITES_NOT_MET if (1) dwc:day is bdq:Empty, or (2) dwc:day is not interpretable as an integer, or (3) dwc:day is interpretable as an integer between 29 and 31 inclusive and dwc:month is not interpretable as an integer between 1 and 12, or (4) dwc:month is interpretable as the integer 2 and dwc:day is interpretable as the integer 29 and dwc:year is not interpretable as a valid ISO 8601 year; COMPLIANT if (1) the value of dwc:day is interpretable as an integer between 1 and 28 inclusive, or (2) dwc:day is interpretable as an integer between 29 and 30 and dwc:month is interpretable as an integer in the set (4,6,9,11), or (3) dwc:day is interpretable as an integer between 29 and 31 and dwc:month is interpretable as an integer in the set (1,3,5,7,8,10,12), or (4) dwc:day is interpretable as the integer 29 and dwc:month is interpretable as the integer 2 and dwc:year is interpretable as is a valid leap year (evenly divisible by 400 or (evenly divisible by 4 but not evenly divisible by 100)); otherwise NOT_COMPLIANT.
Data Quality Dimension Conformance
Term-Actions DAY_INRANGE
Parameter(s)
Source Authority
Specification Last Updated 2024-09-16
Examples [dwc:day="15", dwc:month="", dwc:year="": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:day is in range"]
[dwc:day="30", dwc:month="2", dwc:year="1952": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:day is not in range"]
Source TG2-Gainesville
References
  • ISO (2019) ISO 8601-1:2019(en) Date and time — Representations for information interchange — Part 1: Basic rules. https://www.iso.org/obp/ui/
  • Wikipedia (2020) ISO 8601. https://en.wikipedia.org/wiki/ISO_8601
  • Library of Congress (2019) Extended Date/Time Format (EDTF). https://www.loc.gov/standards/datetime/
Example Implementations (Mechanisms) event_date_qc
Link to Specification Source Code event_date_qc DwCEventDQ.validationDayInrange()
Notes This test must take into account the given month and year, if present, to account for leap years. This is part of a group of similar tests (VALIDATION_DAY_INRANGE (8d787cb5-73e2-4c39-9cd1-67c7361dc02e, #VALIDATION_STARTDAYOFYEAR_INRANGE (85803c7e-2a5a-42e1-b8d3-299a44cafc46), VALIDATION_ENDDAYOFYEAR_INRANGE9a39d88c-7eee-46df-b32a-c109f9f81fb8)).

pzermoglio avatar Jan 18 '18 23:01 pzermoglio

Pass and fail descriptions aren't congruent. Fail description specifies a failure for all days with a value of 29 in a non-leap year. As described in pass description this is equivalent to https://github.com/FilteredPush/event_date_qc/blob/1abbd3f02eb6c28129764defab78f72156972864/src/main/java/org/filteredpush/qc/date/DwCEventDQ.java#L702 DAY_POSSIBLE_FOR_MONTH_YEAR urn:uuid:5618f083-d55a-4ac2-92b5-b9fb227b832f which was termed RECORDED_DATE_MISMATCH at some point in the spreadsheet.

Prerequisites for validations should not include references to amendments - validations are all run in a pre-amendment phase and again in a post-amendment phase.

chicoreus avatar Jan 24 '18 16:01 chicoreus

I also see a problem with the description "OR the value of dwc:day is 29 in a non-leap year" this part must add "and dwc:month=2"

ArthurChapman avatar Jan 24 '18 20:01 ArthurChapman

Check earlier versions of spreadsheet. This appears to be a renamed version of another test which has a guid. See note above. Source is probably ALA or GBIF, but check spreadsheet. Name is confusing, as this is a check for day relative to month and year, not if day is an integer in the range 1 to 31.

chicoreus avatar Jan 30 '18 22:01 chicoreus

@chicoreus Line 18 in the spreadsheet SUPPLEMENTAL, This is another test like https://github.com/tdwg/bdq/issues/127 that we previously had in SUPPLEMENTAL as it was said that DAY and MONTH weren't CORE fields other than for filling in the eventDate. I would suggest putting back into SUPPLEMENTAL

ArthurChapman avatar Jan 30 '18 22:01 ArthurChapman

I would agree with SUPPLEMENTAL

Tasilee avatar Jan 31 '18 05:01 Tasilee

It looks like this one should have the guid: 5618f083-d55a-4ac2-92b5-b9fb227b832f (RECORDED_DATE_MISMATCH). I would propose renaming this test as VALIDATION_DAY_POSSIBLE_FOR_MONTH_YEAR, or something like that to clarify that month and year are terms consulted.

This is not the same as 48aa7d66-36d1-4662-a503-df170f11b03f DAY_INVALID/DAY_IN_RANGE which only evaluated dwc:day.

chicoreus avatar Feb 02 '18 16:02 chicoreus

I see this as a simple test for numeric 1-31 inclusive as #36 covers the temporal logic. If so, I can simplify the descriptions.

Tasilee avatar Aug 14 '18 03:08 Tasilee

Except that this is testing DAY and #36 is testing eventDate. I know arguments were put forward for keeping this test for consistency, but I have my doubts as to whether it should be CORE. Who queries just "Day" and not "eventDate"?

ArthurChapman avatar Aug 14 '18 03:08 ArthurChapman

The issue is beyond querying dwc:day. The ability of tests to flag issues remains important for building evidence of overall 'record utility', either as a simple flag or as a contribution to record summaries as in #31

'Record utility' is not something we have put much effort into, but it must happen. For example, I can readily see the utility of stats or algorithms that cluster/classify records on 'flag' combinations. I'd like to try using (for example) http://www.patn.com.au to classify a collection of records to see if fitness-for-use categories or dataset origins emerged.

Tasilee avatar Aug 14 '18 22:08 Tasilee

Agreed at TDWG 2018 DQIG meeting that this test is correct as written and that another test to validate the day (TG2-VALIDATION_DAY_NOTSTANDARD) is necessary to be able to know if the prerequisite is met. That test specifically tests that the day is an integer. To meet the prerequisites, similar tests must exist for month and year (TG2-VALIDATION_MONTH_NOTSTANDARD, TG2-VALIDATION_YEAR_NOTSTANDARD).

tucotuco avatar Aug 26 '18 00:08 tucotuco

also from @Tasilee Test #127 description states "The value of dwc:day was unambiguously interpreted to be an integer between 1 and 31 inclusive"

Do we get rid of #127 but needed to trigger the Amendment - or do we delete the first sentence of this tests description? Otherwise we appear to have tests testing the same thing (Integer in range)

@chicoreus please respond!

ArthurChapman avatar Aug 30 '18 00:08 ArthurChapman

@Tasilee I think your comment about #36 is relevant. It makes sense to have a test for day is an integer in the range 1-31 as a test (this one), and leave the logic of are the date terms consistent to other tests. It makes sense to me if this test parallels #126.

chicoreus avatar Aug 30 '18 05:08 chicoreus

Discussion Thursday 5:30 at TDWG 2017.

Separate #147 as covering day as integer in range 1-31, independent of month or year. That issue only takes day as information element.

This test assesses day as integer in range 1-28, with 29-31 dependent on month and year. This test takes day, month, year as information element.

chicoreus avatar Aug 30 '18 05:08 chicoreus

As currently defined, this test doesn't isolate a failure condition in dwc:day, if dwc:month or dwc:year are invalid values, then NOT_COMPLIANT is produced, even if day is a sane value (e.g. 1), implying that dwc:day has a problem when it doesn't. The current event_date_qc implementation of this test also produces INTERNAL_PREREQUISITES_NOT_MET if the dwc:month or dwc:year fields contain invalid values.

Suggest changing from "INTERNAL_PREREQUISITES_NOT_MET if any of the fields dwc:day, dwc:month and dwc:year are not present or are EMPTY;" to "INTERNAL_PREREQUISITES_NOT_MET if any of the fields dwc:day, dwc:month and dwc:year are not present or are EMPTY, or if the concatenation dwc:year-dwc:month is not a valid ISO date;"

chicoreus avatar Mar 12 '19 17:03 chicoreus

I think I agree to this @chicoreus. It would appear that we definitely need a flow chart such that order is #147, then #127 before this one (#125). And as mentioned earlier, this is really just determining if is correctly 28, 29 (depending on month and year), 30 or 31 (depending on month).

ArthurChapman avatar Mar 12 '19 20:03 ArthurChapman

Thanks @chicoreus and @ArthurChapman. I agree. Sigh. @tucotuco? @pzermoglio? Happy with that change being made?

Tasilee avatar Mar 14 '19 04:03 Tasilee

I don't think the proposed change covers all the cases in which we want the day to be COMPLIANT. For example, the day values 1 through 28 inclusive should always be COMPLIANT, so should 31 when the month allows it. Getting all the cases is a bit challenging. I tried to capture them in this alternative Expected Response. It needs review for completeness. I added a bunch of examples below to try to test the description.

INTERNAL_PREREQUISITES_NOT_MET if a) dwc:day is not present or is EMPTY, or b) dwc:day can not be cast as an integer, or c) dwc:day can be cast as an integer between 29 and 31 inclusive and dwc:month can not be cast as an integer between 1 and 12, or d) dwc:month can be cast as the integer 2 and dwc:year can not be cast as a valid ISO 8601 year; COMPLIANT if the value of the field dwc:day can be cast as an integer between 1 and 28 inclusive, or if the concatenation dwc:year+'-'+dwc:month+'-'+dwc:day can be interpreted as a valid ISO 8601 date, or if dwc:day can be cast as an integer that is less than or equal to the number of days in the dwc:month cast as an integer, as long as the value of dwc:month cast as an integer is not 2; otherwise NOT_COMPLIANT

dwc:day EMPTY INTERNAL_PREREQUISITES_NOT_MET dwc:day 'x' INTERNAL_PREREQUISITES_NOT_MET dwc:day 29, dwc:month EMPTY INTERNAL_PREREQUISITES_NOT_MET dwc:day 29, dwc:month X INTERNAL_PREREQUISITES_NOT_MET dwc:day 29, dwc:month 0 INTERNAL_PREREQUISITES_NOT_MET dwc:day 29 dwc:month 2 dwc:year EMPTY INTERNAL_PREREQUISITES_NOT_MET dwc:day 29 dwc:month 2 dwc:year XXXX INTERNAL_PREREQUISITES_NOT_MET dwc:day 1 COMPLIANT dwc:day 28 COMPLIANT dwc:day 31 dwc:month 3 COMPLIANT dwc:day 29 dwc:month 2 dwc:year 2016 COMPLIANT

dwc:day 30, dwc:month dwc:day 30, dwc:month

On Thu, Mar 14, 2019 at 1:58 AM Lee Belbin [email protected] wrote:

Thanks @chicoreus https://github.com/chicoreus and @ArthurChapman https://github.com/ArthurChapman. I agree. Sigh. @tucotuco https://github.com/tucotuco? @pzermoglio https://github.com/pzermoglio? Happy with that change being made?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/125#issuecomment-472704559, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP6_zLiyp4-Ue5Tp_CSVGAokZAs17Dks5vWdbogaJpZM4Rjv4R .

tucotuco avatar Mar 26 '19 19:03 tucotuco

Well elucidated @tucotuco - but it gives me a headache. Expected Response updated.

Tasilee avatar Mar 27 '19 19:03 Tasilee

After writing the manuscript section on a case study of GBIF and implementing this test in SQL, I think it would be much clearer and easier for implementers to use the following Expected Response:

"INTERNAL_PREREQUISITES_NOT_MET if a) dwc:day is not present or is EMPTY, or b) dwc:day can not be cast as an integer, or c) dwc:day can be cast as an integer between 29 and 31 inclusive and dwc:month can not be cast as an integer between 1 and 12, or d) dwc:month can be cast as the integer 2 and dwc:month can be cast as the integer 29 and dwc:year can not be cast as a valid ISO 8601 year; COMPLIANT e) if the value of the field dwc:day can be cast as an integer between 1 and 28 inclusive, or f) dwc:day can be cast as an integer between 29 and 30 and dwc:month can be cast as one of (4,6,9,11), or g) dwc:day can be cast as an integer between 29 and 31 and dwc:month can be cast as one of (1,3,5,7,8,10,12), or h) dwc:day can be cast as the integer 29 and dwc:month can be cast as the integer 2 and dwc:year is a valid leap year (evenly divisible by 400 or (evenly divisible by 4 but not evenly divisible by 100)); otherwise NOT_COMPLIANT"

tucotuco avatar Jun 02 '19 15:06 tucotuco

That all makes sense to me.

ArthurChapman avatar Jun 02 '19 23:06 ArthurChapman

I concur

Tasilee avatar Jun 03 '19 03:06 Tasilee

Following the discussions arising from the event date case study for the BISS paper, I believe that this test should be deprecated in favor of an updated TG2-VALIDATION_DAY_NOTSTANDARD (https://github.com/tdwg/bdq/issues/147).

tucotuco avatar Jun 12 '19 15:06 tucotuco

Agree @tucotuco

ArthurChapman avatar Jun 13 '19 01:06 ArthurChapman

I also agree.

Tasilee avatar Jun 13 '19 04:06 Tasilee

Deprecating.

tucotuco avatar Jun 13 '19 04:06 tucotuco

We need both this and #147 This issue is needed to test whether a value for day exists for the given month and year. It must treat non-integer values as internal prerequisites not met. It goes with the couplet of validation #147 (which should take only dwc:day as a parameter and test whether the value of day is an integer), and the amendment #127 (which can propose the replacement of a non standard non-integer value with an integer (e.g. 2nd to 2)). This pairing of a validation and an amendment where the validation tests for exactly what the amendment can correct is critical to measuring improvement of quality. The test for the (not able to be corrected from these three fields alone, and thus not paired with an amendment) existence of a day value given the month and year is a distinct test, and we need both.

chicoreus avatar Aug 16 '19 02:08 chicoreus

@chicoreus If you resurect #125 - the wording of Expected Response is now virtually the same as #147. I guess that you are planning to rewrite one or both of these (@Tasilee says good luck as we can't). I have replaced "cast" as in #147.

ArthurChapman avatar Aug 16 '19 04:08 ArthurChapman

Just edited out last "cast" in Expected Response.

Tasilee avatar Aug 16 '19 05:08 Tasilee

I propose changing from:

INTERNAL_PREREQUISITES_NOT_MET if a) dwc:day is not present or is EMPTY, or b) dwc:day can not be unabiguously interpreted as an integer, or c) dwc:day can be interpreted as an integer between 29 and 31 inclusive and dwc:month can not be interpreted as an integer between 1 and 12, or d) dwc:month can be interpreted as the integer 2 and dwc:month can be interpreted as the integer 29 and dwc:year can not be unambiguously interpreted as a valid ISO 8601 year; COMPLIANT e) if the value of the field dwc:day can be interpreted as an integer between 1 and 28 inclusive, or f) dwc:day can be interpreted as an integer between 29 and 30 and dwc:month can be interpreted as one of (4,6,9,11), or g) dwc:day can be interpreted as an integer between 29 and 31 and dwc:month can be interpreted as one of (1,3,5,7,8,10,12), or h) dwc:day can be interpreted as the integer 29 and dwc:month can be interpreted as the integer 2 and dwc:year is a valid leap year (evenly divisible by 400 or (evenly divisible by 4 but not evenly divisible by 100)); otherwise NOT_COMPLIANT

to:

INTERNAL_PREREQUISITES_NOT_MET if (a) dwc:day is EMPTY or is not an integer, or (b) dwc:day is an integer between 29 and 31 inclusive and dwc:month is not an integer between 1 and 12, or (c) dwc:month is not the integer 2 and dwc:day is the integer 29 and dwc:year is not a valid ISO 8601 year; COMPLIANT (a) if the value of the field dwc:day is an integer between 1 and 28 inclusive, or (b) dwc:day is an integer between 29 and 30 and dwc:month is an integer in the set (4,6,9,11), or (c) dwc:day is an integer between 29 and 31 and dwc:month is an integer in the set (1,3,5,7,8,10,12), or (d) dwc:day is the integer 29 and dwc:month is the integer 2 and dwc:year is a valid leap year (evenly divisible by 400 or (evenly divisible by 4 but not evenly divisible by 100)); otherwise NOT_COMPLIANT.

chicoreus avatar Aug 16 '19 18:08 chicoreus

That looks a lot better @chicoreus

ArthurChapman avatar Aug 16 '19 21:08 ArthurChapman