bdq
bdq copied to clipboard
TG2-AMENDMENT_EVENT_FROM_EVENTDATE
TestField | Value |
---|---|
GUID | 710fe118-17e1-440f-b428-88ba3f547d6d |
Label | AMENDMENT_EVENT_FROM_EVENTDATE |
Description | Proposes an amendment to values in any of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear or dwc:endDayOfYear from the content of dwc:eventDate. |
TestType | Amendment |
Darwin Core Class | dwc:Event |
Information Elements ActedUpon | dwc:year |
dwc:month | |
dwc:day | |
dwc:startDayOfYear | |
dwc:endDayOfYear | |
Information Elements Consulted | dwc:eventDate |
Expected Response | INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is bdq:Empty or contains an invalid value according to ISO 8601; FILLED_IN if any of (1) dwc:day from dwc:eventDate if dwc:day is bdq:Empty and dwc:eventDate has a precision of a day or finer and is within a single day, (2) dwc:month from dwc:eventDate if dwc:month is bdq:Empty and dwc:eventDate has a precision of a single month or finer and is within a single month, (3) dwc:year from dwc:eventDate if dwc:year is bdq:Empty and dwc:eventDate has a precision of a single year or finer and is within a single year, (4) dwc:startDayOfYear and dwc:endDayOfYear if they are bdq:Empty and dwc:eventDate has a precision of a day or better; otherwise NOT_AMENDED. |
Data Quality Dimension | Completeness |
Term-Actions | EVENT_FROM_EVENTDATE |
Parameter(s) | |
Source Authority | |
Specification Last Updated | 2024-09-16 |
Examples | [dwc:eventDate="2023-01-26", dwc:year="2023", dwc:month="", dwc:day="", dwc:startDayOfYear="", dwc:endDayOfYear="": Response.status=FILLED_IN, Response.result=dwc:month="1", dwc:day="26", dwc:startDayOfYear="26", dwc:endDayOfYear="26", Response.comment="dwc:month, dwc:day, dwc:startDayOfyear and dwc:endDayOfYear filled in from dwc:eventDate"] |
[dwc:eventDate="2023", dwc:year="2023", dwc:month="", dwc:day="", dwc:startDayOfYear="", dwc:endDayOfYear="": Response.status=NOT_AMENDED, Response.result=, Response.comment="No amendments possible"] | |
Source | VertNet |
References |
|
Example Implementations (Mechanisms) | Kurator:event_date_qc |
Link to Specification Source Code | FilteredPush event_date_qc DwCEventDQ.amendmentEventFromEventdate() unit test in DwcEventDQTest |
Notes | Only fields that are empty will be have changes proposed, and only if dwc:eventDate has a valid ISO 8601-1 date. The dwc:eventDate is the canonical form of the event date (it is the first trusted form). If event date does not contain a range, dwc:startDayOfYear = dwc:endDayOfYear. Time (as compared to date) is not deemed a CORE component. Note, see sequencing tests section of standards document, run this amendment after any other amendment which may affect dwc:eventDate |
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: One way of simplifying the core test suite would be to identify a small set of primary fields in cases where Darwin Core allows for multiple representations of the data (Event fields are the clearest example of this), and only propose amendments that work to fill in the primary fields from secondary fields (eventDate from day, month, year, verbatimEventDate, startDayOfYear, endDayOfYear, eventTime), and not include in the core suite ammendments that fill in secondary fields from the primary fields.
Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: I agree with Paul ghere
Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: Agree too. However, you may piss some people off, eg those that prefer using YMD instead of eventDate?
@JohnW wanted to be able to back populate - at least for year - as many people want to extract just the year from the data.
The danger with backpopulating is that eventDate is capable of handling richer information than the atomic fields (date ranges and date ranges which span more than one year). A consumer who wants to simply obtain the year from dwc:year does so at their own peril if dwc:eventDate contains a date range which spans more than one year. That, however, is a let the consumer beware kind of issue. We shouldn't advocate back populating because people may want to use the data, as it potentially is unfit for their use, but I'm entirely in favor of back populating in order to make data sets consistent in their presentation - filling in all fields that can be filled in.
I propose changing the description to: One or more empty component terms of the dwc:Event class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear) have been filled in from a valid value in the term dwc:eventDate.
The concept reflected in in the note " and only if dwc:eventDate has a valid ISO 8601:2004(E) date" should be reflected in the description. It is an important point for implementers.
There is an inter-amendment workflow dependency to note here - this amendment should run after all other amendments that may affect the value of eventDate. (i.e. fill in the event date from the verbatimEventDate, then fill in year/month/day/startDayOfYear/endDayOfYear from the interpreted eventDate value, etc.). I've added a note to this effect to the prerequisites.
We should note that time is not included here as it is not considered core. In working with an implementation of this, I've found extracting time from eventDate to be fraught with all sorts of concerns (which aren't going to be core concerns), including handling time zone and handling times on eventDates which involve ranges. Non-trivial to specify appropriate behaviors, and those aren't core. We are much safer not having this particular test propose to fill in eventTime.
We do need to specify if endDayOfYear is expected to be filled in if the eventDate represents a single day.
@chicoreus - wrt to workflow - I have added "After #33, #49, #86, #93, #132" for test #52 - see the circulated workflow document
@chicoreus - fully agree wrt note on time.
I have reviewed the parameters and notes (all good) and believe we have a useable outcome.
See comment in #131 about whether this test should specify not filling in endDayOfYear in cases where eventDate is a range of years.
@chicoreus. In discussion in Gainesville, I think I said I saw less value in filling all these other fields from eventDate than going the other way and making sure eventDate was filled in wherever possible. So, filling in endDayOfYear etc. I see being of lower value, and I would be happy with your suggestion. However, those that advocated for this test should be better commenting on it than I.
Reading your comment on #131, I agree with your logic @chicoreus. How would you work the Expected Response?
INTERNAL_PREREQUESITES_NOT_MET if the field dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; AMENDED if one or more EMPTY terms of the dwc:Event class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear) have been filled in from a valid unambiguously interpretable value in dwc:eventDate, unless dwc:eventDate spans years in which case dwc:endDayOfyear is not FILLED_IN; otherwise NOT_CHANGED?
@Tasilee. We don't have a term for "NOT_FILLED_IN so I would say
INTERNAL_PREREQUESITES_NOT_MET if the field dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; AMENDED if one or more EMPTY terms of the dwc:Event class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear) have been filled in from a valid unambiguously interpretable value in dwc:eventDate and eventDate is wholly within the one year; otherwise, or if dwc:eventDate spans more than one year, NOT_CHANGED?
@ArthurChapman: Much better. Editing.
Or, perhaps even simpler - putting in into positive and being consistent
INTERNAL_PREREQUESITES_NOT_MET if the field dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; AMENDED if one or more EMPTY terms of the dwc:Event class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear) have been filled in from a valid unambiguously interpretable value in dwc:eventDate and eventDate is wholly within the one year; otherwise NOT_CHANGED
@ArthurChapman: Yep
Dealing with timezones is a real issue faced when integrating and accessing data. Yes, it is difficult but clarifying expected behavior when a) data spans timezones, b) dealing with datasets holding date times with and without timezone and c) when the timezone of the consumer is not known would be helpful. It might result in recommendations to store a localDateTime
version as well as a UTC-normalized version.
This should also take into consideration the expectations of simple human observations (e.g. a naturalist record from 07:30 local time on 1st January 2019 in New Zealand being returned in a 2019 search and not a 2018 search) as well as those of machine recorders where high-frequency samples are taken and stored in UTC regardless of location even though the moving organism is crossing timezones.
@timrobertson100 discussion in TG2 call today: (1) time was decided as out of scope for the TG2 tests early on, and we'd have to add in all the complexities of time to address the timezone concern. (2) The tests, in particular this one, are asking about the representation of date in a single record, and don't involve comparisons between dates in different records. If dwc:eventTime is included in the New Zealand record you discuss, a consumer of the data is able to interpret which year to place that record into for purposes of search, we are seeing this as an independent question from the internal consistency of the terms in the record itself.
Discussion in call and @tucotuco 's observation that Darwin Core is vauge on whether dwc:endDayOfYear is tied to the end of a date range, and the thought that we could put a stake in the ground towards expectations for this case of endDayOfYear meaning the day of the year of the end of a range expressed in eventDate. Thus, @Tasilee here's a proposal for changing the specification of this test:
INTERNAL_PREREQUESITES_NOT_MET if the field dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; AMENDED if one or more EMPTY terms of the dwc:Event class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear) have been filled in from a valid unambiguously interpretable value in dwc:eventDate; otherwise NOT_CHANGED
And Note: Only fields that are empty will be amended, and only if dwc:eventDate has a valid ISO 8601-1:2019 date. The dwc:eventDate is the canonical form of the event date (it is the first trusted form). If event date does not contain a range, dwc:startDayOfYear = dwc:endDayOfYear. Time (as compared to date) is not deemed a CORE component. NB Run this amendment after any other amendment which may affect dwc:eventDate. If eventDate contains a date range, dwc:startDayOfYear is to be interpreted as the day of year of the start of the date range, and dwc:endDayOfYear is to be interpreted as the day of the year of the end of the date range, thus endDayOfYear could be smaller than startDayOfYear as in 2015-12-15/2016-01-15.
Thanks @chicoreus. It is simpler but certainly needs the Note for clarification. Is everyone else happy with this before I amend?
In regards ISO 8601, we can reference it in the specs as discussed but I am tempted to add https://en.wikipedia.org/wiki/ISO_8601 to the references everywhere (in each test) that requires it. What do you think? Reasoning: We all agreed on the benefits of having our table as self-contained as possible (accepting a move from non-canonical to Notes) and access to details of ISO standards is not a simple end-point or free.
Also worth referencing EDTF https://www.loc.gov/standards/datetime/ which is incorporated into ISO:8601-2 (2019), and looking for somewhere with a good summary of 8601/8601-2 to point people at.
Agreed. I have added these refs here, but assume they need to go everywhere we have a TIME tab?
Changed "AMENDED" to "FILLED_IN" in accordance with discussions April 16.
Given recent discussions, I have changed
INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or contains an invalid value according to bdq:sourceAuthority; FILLED_IN one or more EMPTY terms dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear if they can be unambiguously interpreted from values in dwc:eventDate, and dwc:eventDate is wholly within one year; otherwise NOT_AMENDED
to
INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or contains an invalid ISO 8601-1 date; FILLED_IN one or more EMPTY terms dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear if they can be unambiguously interpreted from values in dwc:eventDate, and dwc:eventDate is wholly within one year; otherwise NOT_AMENDED
...and I have removed the ref to bdq:sourceAuthority.
This may be too picky, but I would reword to:
INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or contains a value that is not valid according to ISO 8601-1; FILLED_IN one or more EMPTY terms dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear if they can be unambiguously interpreted from values in dwc:eventDate, and dwc:eventDate is wholly within one year; otherwise NOT_AMENDED
I like that wording @tucotuco
Given comments from @tucotuco, then maybe
INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or contains a value that is not valid according to ISO 8601-1:2019; FILLED_IN one or more EMPTY terms dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear if they can be unambiguously interpreted from values in dwc:eventDate, and dwc:eventDate is wholly within one year; otherwise NOT_AMENDED
I have changed it to this for now and will align the other occurrences if there is agreement.