bdq
bdq copied to clipboard
TG2-VALIDATION_DATEIDENTIFIED_INRANGE
TestField | Value |
---|---|
GUID | dc8aae4b-134f-4d75-8a71-c4186239178e |
Label | VALIDATION_DATEIDENTIFIED_INRANGE |
Description | Is the value of dwc:dateIdentified within Parameter ranges and either overlap or is later than dwc:eventDate? |
TestType | Validation |
Darwin Core Class | dwc:Identification |
Information Elements ActedUpon | dwc:dateIdentified |
Information Elements Consulted | dwc:eventDate |
Expected Response | INTERNAL_PREREQUISITES_NOT_MET if (1) dwc:dateIdentified is bdq:Empty, or (2) dwc:dateIdentified contains an invalid value according to ISO 8601, or (3) bdq:includeEventDate=true and dwc:eventDate is not a valid ISO 8601 date; COMPLIANT if the value of dwc:dateIdentified is between bdq:earliestValidDate and bdq:latestValidDate inclusive and either (1) dwc:eventDate is bdq:Empty or bdq:includeEventDate=false, or (2) if dwc:eventDate is a valid ISO 8601 date and dwc:dateIdentified overlaps or is later than the dwc:eventDate; otherwise NOT_COMPLIANT |
Data Quality Dimension | Likeliness |
Term-Actions | DATEIDENTIFIED_INRANGE |
Parameter(s) | bdq:earliestValidDate |
bdq:latestValidDate | |
bdq:includeEventDate | |
Source Authority | |
bdq:earliestValidDate default = "1753-01-01" | |
bdq:latestValidDate default = "{current day}" | |
bdq:includeEventDate default = "true" | |
Specification Last Updated | 2024-09-16 |
Examples | [dwc:dateIdentified="1963-03-08T14:07-0600", dwc:eventDate="1962-11-01T10:00-0600": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:dateIdentified is in range"] |
[dwc:dateIdentified="1963-03-08T14:07-0600", dwc:eventDate="1964-11-01T10:00-0600": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:dateIdentified before dwc:eventDate"] | |
Source | GBIF, ALA |
References |
|
Example Implementations (Mechanisms) | Kurator:event_date_qc |
Link to Specification Source Code | https://github.com/FilteredPush/event_date_qc/blob/37d349b79f05a76eeb264bafe2315ce88493ecb7/src/main/java/org/filteredpush/qc/date/DwCOtherDateDQ.java#L181 |
Notes | There may be valid identifications prior to Linnaeus, but this test will flag these under the default value of bdq:earliestValidDate, as for most biodiversity data, pre-linnaean identification dates are likely to be errors. If a parameter is not set, then the default is 1753-01-01. This test will, by design, flag as problematic cases (such as LTER plots and marine mammal sightings) where a known individual organism is identified by a specialist and then subsequently observed without new taxonomic identifications being made. |
Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: Another test: DATE_IDENTIFIED_IN_FUTURE (VALIDATION_DATEIDENTIFIED_OUTOFRANGE), already takes care of the future part, either take out future from here or not run the other test.
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: Rename VALIDATION_IDENTIFIED_DATE_PLAUSIBLE
Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: This would work AFTER eventDate has been interpreted, if needed be. Also, might be tricky if one of the two dates are not complete.
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: Rename VALIDATION_IDENTIFIED_ATORAFTER_OCCURRENCE
Same issue as #36 with ranges - specification should be consistent with that issue.
I agree with @chicoreus and as I have commented on #66, "I would say #36 SHOULD cover a 'not possible' date within a valid time range."
@chicoreus. Unless I am misreading you - this is not the same as #36. It doesn't include the complexity of eventDate - and can't be identified before it is collected or into the future - it has a more definite range than you have been indicating with eventDate surely?
Agreed at TDWG 2018 DQIG meeting that the test should parallel TG2-VALIDATION_EVENTDATE_OUTOFRANGE in terms of an optional earlier limit.
@ArthurChapman Since both dwc:eventDate and dwc:dateIdentified are expected to contain ISO dates, the complexities of both are the same (they are't the same as the complexities of the multiple temporal terms in Event). Identifications existing now can't have been made in the future (like collecting/observing events), but under some conditions identifications can be made before occurrence events. For example, long term monitoring of a particular individual organism may begin with the identification of the organism to species, and then a sequence of observations of that organism at different times (and for mobile organisms at different places) may be made after that identification.
I flagged this as Needs Work because I am having problems implementing it from the given specification:
INTERNAL_PREREQUISITES_NOT_MET if there is no default designated date or the field dwc:dateIdentified is either not present, is EMPTY or is not a valid ISO 8601-1:2019 date; COMPLIANT if the value of the field dwc:dateIdentified is not prior to the eventDate, does not extend into the future, or optionally does not extend before a date designated when the test is run (e.g., prior to 1753-01-01); otherwise NOT_COMPLIANT
(1) No "default designated date" is not a defined concept, I don't know what to do with this. It sounds like it is a reference to the default values for bdq:earliyestDate and bdq:latestDate (to put the parameters into a bdq namespace), but these are defined values, so their absence would be a defect in the implementation not a test failure condition.
I suggest we change the specification by removing "there is no default designated date or" to:
INTERNAL_PREREQUISITES_NOT_MET if the field dwc:dateIdentified is either not present, is EMPTY or is not a valid ISO 8601-1:2019 date; COMPLIANT if the value of the field dwc:dateIdentified is not prior to the eventDate, does not extend into the future, or optionally does not extend before a date designated when the test is run (e.g., prior to 1753-01-01); otherwise NOT_COMPLIANT
Again @chicoreus your reasoning seems sound. The Parameters including default values were added after the Expected Response(s) was written. At the time of writing the Expected Response - we didn't have a defined default. Now that we do (I think in all cases which I believe is important to stop lots of failures because someone forgot to set a default), I think your new wording is good. I am happy with the new wording.
Your argument about identification prior to an event to me is a rather pedantic one. I am not sure that you can call it an identification if you don't (at the time) have something to identify. In the cases you mention - I would regard the identification as being simultaneous to the observation (event). If you are looking for a particular organism, then when you find it and pick it up, or "identify" it through observation, then that is when the identification took place.
Thanks @chicoreus and @ArthurChapman: Well picked up. I will amend accordingly and would value a check.
Checked @Tasilee. We did have another error we had in the Example 1573 rather than 1753 which I fixed.
Thanks @ArthurChapman
Great. The specification, however, still contains dwc:eventDate, but this term is not listed as an information element.
@ArthurChapman I would still argue that the reference to eventDate should be removed from the specification, typical examples of this are a long term ecological monitoring plots and arboreta where a specialist makes an identification of an organism at one point in time, it is marked with an identifier (such as a number on a stamped metal plate), and then for many years, non-specialists return to observe the state of that organism in other occurrence events without making new identifications. Similar are observations of marine mammals, where observations may be made over time of a particular individual, and some arbitrary point in time in the sequence of observations, a specialist makes a species identification. Occurrences recording that known individual may precede or follow the identification.
The definition needs to be changed to remove dwc:eventDate, or dwc:eventDate needs to be added to the list of information elements.
@chicoreus I see your arguments. Two option, I guess. 1) we remove eventDate from the Response or 2) we leave it in - add dwc:eventDate to Information Elements. I guess I would be happy either way. The argument for 2) would be that it is likely a rare event that the identification would be prior to the event - do we want to flag those to make sure they aren't errors if there aren't a lot of them, or miss the real errors that my guess would be would be more frequent. What do others think?
I think we want to flag the likely error more than we want to protect the rare case from getting a notification.
On Fri, Aug 9, 2019 at 7:15 PM Arthur Chapman [email protected] wrote:
@chicoreus https://github.com/chicoreus I see your arguments. Two option, I guess. 1) we remove eventDate from the Response or 2) we leave it in - add dwc:eventDate to Information Elements. I guess I would be happy either way. The argument for 2) would be that it is likely a rare event that the identification would be prior to the event - do we want to flag those to make sure they aren't errors if there aren't a lot of them, or miss the real errors that my guess would be would be more frequent. What do others think?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/76?email_source=notifications&email_token=AADQ722PM7IHBQQFNNLR743QDXT7HA5CNFSM4EKSOVRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3747TY#issuecomment-520081359, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ724OSJ7OU235VV4LMO3QDXT7HANCNFSM4EKSOVRA .
My consistent stance would be err on the side of caution (as long as we have a test that is considered valuable) that we flag a POTENTIAL issue (false positive) rather than letting it 'get through' (false negative). So my preference, like @tucotuco, would be @ArthurChapman's (2) - to add dwc:eventDate to the Information Elements.
Consensus seems to be to add dwc:eventDate as an information element, and leave the reference to it in the specification. I've done this. I've also added a comment on the rare cases of identifications preceeding observations flagged by design to the notes.
In implementing, noticed a couple more issues with the specifiction.
(1) The specification is not consistent:
COMPLIANT if the value of the field dwc:dateIdentified is not prior to the eventDate, does not extend into the future, or optionally does not extend before a date designated when the test is run (e.g., prior to 1753-01-01);
dwc:dateIdentified is not prior to the eventDate
dwc:dateIdentified does not extend into the future
dwc:dateIdentified optionally does not extend before a date designated...
As stated, this means that dwc:dateIdentified can be a range that extends prior to the eventDate, so long as it overlaps the eventDate range, but that it cannot as a range extend into the future or before the earliest date specified.
Is this difference by design (e.g. accommodating dateIdentified known only to year, but eventDate known to day (dateIdentified=1974; eventDate 1974-11-15), probably a common case)?
Added dwc:eventDate is non-empty but not a valid ISO date as a condition for internal prerequisites not met (so that this test doesn't flag as non-compliant simply non-conforming values in eventDate) (which should be consulted rather than acted upon).
@cichoreus - You may like to check new wording of Expected Response which we have simplified wrt Parameter
@ArthurChapman the rewording (within) looks like it deals with the conflict. However, the iplementation I had in event_date_qc was buggy, and in fixing, I am still running into the issue with:
dwc:dateIdentified is not prior to the eventDate
We need a clearer definition of prior, as typical values such as dateIdentified=1974; eventDate 1974-11-15 fail, since 1974-01-10 (the start of the interval represented by the value of the dateIdentified) precedes 1974-11-15 (the eventDate) - this will be a common case, as dates of identification are often only specified to year, while collecting event dates are more frequently identified to day.
I would suggest:
dwc:dateIdentified overlaps or follows the dwc:eventDate
This would make a dateIdentified which entirely preceded the eventDate non compliant, but would allow the dateIdentified to begin before the eventDate.
@chicoreus That would be OK by me.
@chicoreus I also agree: Editing accordingly.
The wording of this test appears wrong to me. It says that "INTERNAL_PREREQUISITES_NOT_MET if ..., or if dwc:eventDate is not EMPTY and is not a valid ISO 8601-1:2019 date" but then "COMPLIANT if the value of dwc:dateIdentified overlaps or follows the dwc:eventDate" BUT that means that if the dwc:eventDate has a an iso value, the INTERNAL_PREREQUISITES_NOT_MET so it is not then possible for dwc:ddeateIdentified to overlap or follow so be COMPLIANT
@ArthurChapman that does read awkwardly. I interpret as (1) if both eventDate and dateIdentified have valid values, then they can be compared and dateIdentified compared to the earlyest/latest date, (2) if dateIdentified has a valid value and eventDate is empty, then dateIdentified is compared to earlyest/latest date. If dateIdentified is empty, or dateIdentified contains an invalid value or eventDate has an invalid value then internal prerequisites are not met.
How about changing from:
INTERNAL_PREREQUISITES_NOT_MET if dwc:dateIdentified is EMPTY or is not a valid ISO 8601-1:2019 date, or if dwc:eventDate is not EMPTY and is not a valid ISO 8601-1:2019 date; COMPLIANT if the value of dwc:dateIdentified overlaps or follows the dwc:eventDate, and is within the Parameter range; otherwise NOT_COMPLIANT
To:
INTERNAL_PREREQUISITES_NOT_MET if any of the following three conditions are met (1) dwc:dateIdentified is EMPTY, (2) dwc:dateIdentified is not a valid ISO 8601-1:2019 date, (3) dwc:eventDate is not EMPTY and is not a valid ISO 8601-1:2019 date; COMPLIANT if the value of dwc:dateIdentified is within the parameter ranges and if dwc:eventDate is not EMPTY dwc:dateIdentified overlaps or follows the dwc:eventDate; otherwise NOT_COMPLIANT
I agree with @chicoreus suggestion and have edited the Expected Response. It is far more explicit.
I'd add (just to make it clearer) ... COMPLIANT if the value of dwc:dateIdentified is within the parameter ranges and if dwc:eventDate is not EMPTY ** and is a valid ISO 8601-1:2019 date** and dwc:dateIdentified overlaps or follows the dwc:eventDate; ...
Thanks @ArthurChapman: Agreed and done.