bdq icon indicating copy to clipboard operation
bdq copied to clipboard

TG2-VALIDATION_DATEIDENTIFIED_INRANGE

Open iDigBioBot opened this issue 7 years ago • 92 comments

TestField Value
GUID dc8aae4b-134f-4d75-8a71-c4186239178e
Label VALIDATION_DATEIDENTIFIED_INRANGE
Description Is the value of dwc:dateIdentified within Parameter ranges and either overlap or is later than dwc:eventDate?
TestType Validation
Darwin Core Class dwc:Identification
Information Elements ActedUpon dwc:dateIdentified
Information Elements Consulted dwc:eventDate
Expected Response INTERNAL_PREREQUISITES_NOT_MET if (1) dwc:dateIdentified is bdq:Empty, or (2) dwc:dateIdentified contains an invalid value according to ISO 8601, or (3) bdq:includeEventDate=true and dwc:eventDate is not a valid ISO 8601 date; COMPLIANT if the value of dwc:dateIdentified is between bdq:earliestValidDate and bdq:latestValidDate inclusive and either (1) dwc:eventDate is bdq:Empty or bdq:includeEventDate=false, or (2) if dwc:eventDate is a valid ISO 8601 date and dwc:dateIdentified overlaps or is later than the dwc:eventDate; otherwise NOT_COMPLIANT
Data Quality Dimension Likeliness
Term-Actions DATEIDENTIFIED_INRANGE
Parameter(s) bdq:earliestValidDate
bdq:latestValidDate
bdq:includeEventDate
Source Authority
bdq:earliestValidDate default = "1753-01-01"
bdq:latestValidDate default = "{current day}"
bdq:includeEventDate default = "true"
Specification Last Updated 2024-09-16
Examples [dwc:dateIdentified="1963-03-08T14:07-0600", dwc:eventDate="1962-11-01T10:00-0600": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:dateIdentified is in range"]
[dwc:dateIdentified="1963-03-08T14:07-0600", dwc:eventDate="1964-11-01T10:00-0600": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:dateIdentified before dwc:eventDate"]
Source GBIF, ALA
References
  • ISO (2019) ISO 8601-1:2019(en) Date and time — Representations for information interchange — Part 1: Basic rules. https://www.iso.org/obp/ui/
  • Wikipedia (2020) ISO 8601. https://en.wikipedia.org/wiki/ISO_8601
  • Library of Congress (2019) Extended Date/Time Format (EDTF). https://www.loc.gov/standards/datetime/
Example Implementations (Mechanisms) Kurator:event_date_qc
Link to Specification Source Code https://github.com/FilteredPush/event_date_qc/blob/37d349b79f05a76eeb264bafe2315ce88493ecb7/src/main/java/org/filteredpush/qc/date/DwCOtherDateDQ.java#L181
Notes There may be valid identifications prior to Linnaeus, but this test will flag these under the default value of bdq:earliestValidDate, as for most biodiversity data, pre-linnaean identification dates are likely to be errors. If a parameter is not set, then the default is 1753-01-01. This test will, by design, flag as problematic cases (such as LTER plots and marine mammal sightings) where a known individual organism is identified by a specialist and then subsequently observed without new taxonomic identifications being made.

iDigBioBot avatar Jan 05 '18 15:01 iDigBioBot

Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: Another test: DATE_IDENTIFIED_IN_FUTURE (VALIDATION_DATEIDENTIFIED_OUTOFRANGE), already takes care of the future part, either take out future from here or not run the other test.

iDigBioBot avatar Jan 12 '18 16:01 iDigBioBot

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: Rename VALIDATION_IDENTIFIED_DATE_PLAUSIBLE

iDigBioBot avatar Jan 12 '18 16:01 iDigBioBot

Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: This would work AFTER eventDate has been interpreted, if needed be. Also, might be tricky if one of the two dates are not complete.

iDigBioBot avatar Jan 12 '18 16:01 iDigBioBot

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: Rename VALIDATION_IDENTIFIED_ATORAFTER_OCCURRENCE

iDigBioBot avatar Jan 12 '18 16:01 iDigBioBot

Same issue as #36 with ranges - specification should be consistent with that issue.

chicoreus avatar Feb 06 '18 19:02 chicoreus

I agree with @chicoreus and as I have commented on #66, "I would say #36 SHOULD cover a 'not possible' date within a valid time range."

Tasilee avatar Feb 07 '18 23:02 Tasilee

@chicoreus. Unless I am misreading you - this is not the same as #36. It doesn't include the complexity of eventDate - and can't be identified before it is collected or into the future - it has a more definite range than you have been indicating with eventDate surely?

ArthurChapman avatar Feb 10 '18 05:02 ArthurChapman

Agreed at TDWG 2018 DQIG meeting that the test should parallel TG2-VALIDATION_EVENTDATE_OUTOFRANGE in terms of an optional earlier limit.

tucotuco avatar Aug 26 '18 00:08 tucotuco

@ArthurChapman Since both dwc:eventDate and dwc:dateIdentified are expected to contain ISO dates, the complexities of both are the same (they are't the same as the complexities of the multiple temporal terms in Event). Identifications existing now can't have been made in the future (like collecting/observing events), but under some conditions identifications can be made before occurrence events. For example, long term monitoring of a particular individual organism may begin with the identification of the organism to species, and then a sequence of observations of that organism at different times (and for mobile organisms at different places) may be made after that identification.

chicoreus avatar Aug 07 '19 14:08 chicoreus

I flagged this as Needs Work because I am having problems implementing it from the given specification:

INTERNAL_PREREQUISITES_NOT_MET if there is no default designated date or the field dwc:dateIdentified is either not present, is EMPTY or is not a valid ISO 8601-1:2019 date; COMPLIANT if the value of the field dwc:dateIdentified is not prior to the eventDate, does not extend into the future, or optionally does not extend before a date designated when the test is run (e.g., prior to 1753-01-01); otherwise NOT_COMPLIANT

(1) No "default designated date" is not a defined concept, I don't know what to do with this. It sounds like it is a reference to the default values for bdq:earliyestDate and bdq:latestDate (to put the parameters into a bdq namespace), but these are defined values, so their absence would be a defect in the implementation not a test failure condition.

I suggest we change the specification by removing "there is no default designated date or" to:

INTERNAL_PREREQUISITES_NOT_MET if the field dwc:dateIdentified is either not present, is EMPTY or is not a valid ISO 8601-1:2019 date; COMPLIANT if the value of the field dwc:dateIdentified is not prior to the eventDate, does not extend into the future, or optionally does not extend before a date designated when the test is run (e.g., prior to 1753-01-01); otherwise NOT_COMPLIANT

chicoreus avatar Aug 07 '19 14:08 chicoreus

Again @chicoreus your reasoning seems sound. The Parameters including default values were added after the Expected Response(s) was written. At the time of writing the Expected Response - we didn't have a defined default. Now that we do (I think in all cases which I believe is important to stop lots of failures because someone forgot to set a default), I think your new wording is good. I am happy with the new wording.

Your argument about identification prior to an event to me is a rather pedantic one. I am not sure that you can call it an identification if you don't (at the time) have something to identify. In the cases you mention - I would regard the identification as being simultaneous to the observation (event). If you are looking for a particular organism, then when you find it and pick it up, or "identify" it through observation, then that is when the identification took place.

ArthurChapman avatar Aug 08 '19 02:08 ArthurChapman

Thanks @chicoreus and @ArthurChapman: Well picked up. I will amend accordingly and would value a check.

Tasilee avatar Aug 08 '19 07:08 Tasilee

Checked @Tasilee. We did have another error we had in the Example 1573 rather than 1753 which I fixed.

ArthurChapman avatar Aug 08 '19 23:08 ArthurChapman

Thanks @ArthurChapman

Tasilee avatar Aug 09 '19 00:08 Tasilee

Great. The specification, however, still contains dwc:eventDate, but this term is not listed as an information element.

@ArthurChapman I would still argue that the reference to eventDate should be removed from the specification, typical examples of this are a long term ecological monitoring plots and arboreta where a specialist makes an identification of an organism at one point in time, it is marked with an identifier (such as a number on a stamped metal plate), and then for many years, non-specialists return to observe the state of that organism in other occurrence events without making new identifications. Similar are observations of marine mammals, where observations may be made over time of a particular individual, and some arbitrary point in time in the sequence of observations, a specialist makes a species identification. Occurrences recording that known individual may precede or follow the identification.

The definition needs to be changed to remove dwc:eventDate, or dwc:eventDate needs to be added to the list of information elements.

chicoreus avatar Aug 09 '19 19:08 chicoreus

@chicoreus I see your arguments. Two option, I guess. 1) we remove eventDate from the Response or 2) we leave it in - add dwc:eventDate to Information Elements. I guess I would be happy either way. The argument for 2) would be that it is likely a rare event that the identification would be prior to the event - do we want to flag those to make sure they aren't errors if there aren't a lot of them, or miss the real errors that my guess would be would be more frequent. What do others think?

ArthurChapman avatar Aug 09 '19 22:08 ArthurChapman

I think we want to flag the likely error more than we want to protect the rare case from getting a notification.

On Fri, Aug 9, 2019 at 7:15 PM Arthur Chapman [email protected] wrote:

@chicoreus https://github.com/chicoreus I see your arguments. Two option, I guess. 1) we remove eventDate from the Response or 2) we leave it in - add dwc:eventDate to Information Elements. I guess I would be happy either way. The argument for 2) would be that it is likely a rare event that the identification would be prior to the event - do we want to flag those to make sure they aren't errors if there aren't a lot of them, or miss the real errors that my guess would be would be more frequent. What do others think?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/76?email_source=notifications&email_token=AADQ722PM7IHBQQFNNLR743QDXT7HA5CNFSM4EKSOVRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3747TY#issuecomment-520081359, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ724OSJ7OU235VV4LMO3QDXT7HANCNFSM4EKSOVRA .

tucotuco avatar Aug 09 '19 22:08 tucotuco

My consistent stance would be err on the side of caution (as long as we have a test that is considered valuable) that we flag a POTENTIAL issue (false positive) rather than letting it 'get through' (false negative). So my preference, like @tucotuco, would be @ArthurChapman's (2) - to add dwc:eventDate to the Information Elements.

Tasilee avatar Aug 11 '19 23:08 Tasilee

Consensus seems to be to add dwc:eventDate as an information element, and leave the reference to it in the specification. I've done this. I've also added a comment on the rare cases of identifications preceeding observations flagged by design to the notes.

chicoreus avatar Aug 12 '19 13:08 chicoreus

In implementing, noticed a couple more issues with the specifiction.

(1) The specification is not consistent:

COMPLIANT if the value of the field dwc:dateIdentified is not prior to the eventDate, does not extend into the future, or optionally does not extend before a date designated when the test is run (e.g., prior to 1753-01-01);

dwc:dateIdentified is not prior to the eventDate

dwc:dateIdentified does not extend into the future

dwc:dateIdentified optionally does not extend before a date designated...

As stated, this means that dwc:dateIdentified can be a range that extends prior to the eventDate, so long as it overlaps the eventDate range, but that it cannot as a range extend into the future or before the earliest date specified.

Is this difference by design (e.g. accommodating dateIdentified known only to year, but eventDate known to day (dateIdentified=1974; eventDate 1974-11-15), probably a common case)?

chicoreus avatar Aug 12 '19 14:08 chicoreus

Added dwc:eventDate is non-empty but not a valid ISO date as a condition for internal prerequisites not met (so that this test doesn't flag as non-compliant simply non-conforming values in eventDate) (which should be consulted rather than acted upon).

chicoreus avatar Aug 12 '19 14:08 chicoreus

@cichoreus - You may like to check new wording of Expected Response which we have simplified wrt Parameter

ArthurChapman avatar Aug 13 '19 04:08 ArthurChapman

@ArthurChapman the rewording (within) looks like it deals with the conflict. However, the iplementation I had in event_date_qc was buggy, and in fixing, I am still running into the issue with:

dwc:dateIdentified is not prior to the eventDate

We need a clearer definition of prior, as typical values such as dateIdentified=1974; eventDate 1974-11-15 fail, since 1974-01-10 (the start of the interval represented by the value of the dateIdentified) precedes 1974-11-15 (the eventDate) - this will be a common case, as dates of identification are often only specified to year, while collecting event dates are more frequently identified to day.

I would suggest:

dwc:dateIdentified overlaps or follows the dwc:eventDate

This would make a dateIdentified which entirely preceded the eventDate non compliant, but would allow the dateIdentified to begin before the eventDate.

chicoreus avatar Aug 29 '19 14:08 chicoreus

@chicoreus That would be OK by me.

ArthurChapman avatar Aug 29 '19 21:08 ArthurChapman

@chicoreus I also agree: Editing accordingly.

Tasilee avatar Aug 29 '19 22:08 Tasilee

The wording of this test appears wrong to me. It says that "INTERNAL_PREREQUISITES_NOT_MET if ..., or if dwc:eventDate is not EMPTY and is not a valid ISO 8601-1:2019 date" but then "COMPLIANT if the value of dwc:dateIdentified overlaps or follows the dwc:eventDate" BUT that means that if the dwc:eventDate has a an iso value, the INTERNAL_PREREQUISITES_NOT_MET so it is not then possible for dwc:ddeateIdentified to overlap or follow so be COMPLIANT

ArthurChapman avatar Feb 07 '22 20:02 ArthurChapman

@ArthurChapman that does read awkwardly. I interpret as (1) if both eventDate and dateIdentified have valid values, then they can be compared and dateIdentified compared to the earlyest/latest date, (2) if dateIdentified has a valid value and eventDate is empty, then dateIdentified is compared to earlyest/latest date. If dateIdentified is empty, or dateIdentified contains an invalid value or eventDate has an invalid value then internal prerequisites are not met.

How about changing from:

INTERNAL_PREREQUISITES_NOT_MET if dwc:dateIdentified is EMPTY or is not a valid ISO 8601-1:2019 date, or if dwc:eventDate is not EMPTY and is not a valid ISO 8601-1:2019 date; COMPLIANT if the value of dwc:dateIdentified overlaps or follows the dwc:eventDate, and is within the Parameter range; otherwise NOT_COMPLIANT

To:

INTERNAL_PREREQUISITES_NOT_MET if any of the following three conditions are met (1) dwc:dateIdentified is EMPTY, (2) dwc:dateIdentified is not a valid ISO 8601-1:2019 date, (3) dwc:eventDate is not EMPTY and is not a valid ISO 8601-1:2019 date; COMPLIANT if the value of dwc:dateIdentified is within the parameter ranges and if dwc:eventDate is not EMPTY dwc:dateIdentified overlaps or follows the dwc:eventDate; otherwise NOT_COMPLIANT

chicoreus avatar Feb 07 '22 20:02 chicoreus

I agree with @chicoreus suggestion and have edited the Expected Response. It is far more explicit.

Tasilee avatar Feb 07 '22 21:02 Tasilee

I'd add (just to make it clearer) ... COMPLIANT if the value of dwc:dateIdentified is within the parameter ranges and if dwc:eventDate is not EMPTY ** and is a valid ISO 8601-1:2019 date** and dwc:dateIdentified overlaps or follows the dwc:eventDate; ...

ArthurChapman avatar Feb 07 '22 21:02 ArthurChapman

Thanks @ArthurChapman: Agreed and done.

Tasilee avatar Feb 09 '22 20:02 Tasilee