bdq icon indicating copy to clipboard operation
bdq copied to clipboard

TG2-AMENDMENT_EVENT_FROM_EVENTDATE

Open iDigBioBot opened this issue 7 years ago • 57 comments

TestField Value
GUID 710fe118-17e1-440f-b428-88ba3f547d6d
Label AMENDMENT_EVENT_FROM_EVENTDATE
Description Proposes an amendment to values in any of dwc:year, dwc:month, dwc:day, dwc:startDayOfYear or dwc:endDayOfYear from the content of dwc:eventDate.
TestType Amendment
Darwin Core Class dwc:Event
Information Elements ActedUpon dwc:year
dwc:month
dwc:day
dwc:startDayOfYear
dwc:endDayOfYear
Information Elements Consulted dwc:eventDate
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is bdq:Empty or contains an invalid value according to ISO 8601; FILLED_IN if any of (1) dwc:day from dwc:eventDate if dwc:day is bdq:Empty and dwc:eventDate has a precision of a day or finer and is within a single day, (2) dwc:month from dwc:eventDate if dwc:month is bdq:Empty and dwc:eventDate has a precision of a single month or finer and is within a single month, (3) dwc:year from dwc:eventDate if dwc:year is bdq:Empty and dwc:eventDate has a precision of a single year or finer and is within a single year, (4) dwc:startDayOfYear and dwc:endDayOfYear if they are bdq:Empty and dwc:eventDate has a precision of a day or better; otherwise NOT_AMENDED.
Data Quality Dimension Completeness
Term-Actions EVENT_FROM_EVENTDATE
Parameter(s)
Source Authority
Specification Last Updated 2024-09-16
Examples [dwc:eventDate="2023-01-26", dwc:year="2023", dwc:month="", dwc:day="", dwc:startDayOfYear="", dwc:endDayOfYear="": Response.status=FILLED_IN, Response.result=dwc:month="1", dwc:day="26", dwc:startDayOfYear="26", dwc:endDayOfYear="26", Response.comment="dwc:month, dwc:day, dwc:startDayOfyear and dwc:endDayOfYear filled in from dwc:eventDate"]
[dwc:eventDate="2023", dwc:year="2023", dwc:month="", dwc:day="", dwc:startDayOfYear="", dwc:endDayOfYear="": Response.status=NOT_AMENDED, Response.result=, Response.comment="No amendments possible"]
Source VertNet
References
  • ISO (2019) ISO 8601-1:2019(en) Date and time — Representations for information interchange — Part 1: Basic rules. https://www.iso.org/obp/ui/
  • Wikipedia (2020) ISO 8601. https://en.wikipedia.org/wiki/ISO_8601
  • Library of Congress (2019) Extended Date/Time Format (EDTF). https://www.loc.gov/standards/datetime/
Example Implementations (Mechanisms) Kurator:event_date_qc
Link to Specification Source Code FilteredPush event_date_qc DwCEventDQ.amendmentEventFromEventdate() unit test in DwcEventDQTest
Notes Only fields that are empty will be have changes proposed, and only if dwc:eventDate has a valid ISO 8601-1 date. The dwc:eventDate is the canonical form of the event date (it is the first trusted form). If event date does not contain a range, dwc:startDayOfYear = dwc:endDayOfYear. Time (as compared to date) is not deemed a CORE component. Note, see sequencing tests section of standards document, run this amendment after any other amendment which may affect dwc:eventDate

iDigBioBot avatar Jan 05 '18 15:01 iDigBioBot

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: One way of simplifying the core test suite would be to identify a small set of primary fields in cases where Darwin Core allows for multiple representations of the data (Event fields are the clearest example of this), and only propose amendments that work to fill in the primary fields from secondary fields (eventDate from day, month, year, verbatimEventDate, startDayOfYear, endDayOfYear, eventTime), and not include in the core suite ammendments that fill in secondary fields from the primary fields.

iDigBioBot avatar Jan 12 '18 16:01 iDigBioBot

Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: I agree with Paul ghere

iDigBioBot avatar Jan 12 '18 16:01 iDigBioBot

Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: Agree too. However, you may piss some people off, eg those that prefer using YMD instead of eventDate?

iDigBioBot avatar Jan 12 '18 16:01 iDigBioBot

@JohnW wanted to be able to back populate - at least for year - as many people want to extract just the year from the data.

ArthurChapman avatar Jan 28 '18 22:01 ArthurChapman

The danger with backpopulating is that eventDate is capable of handling richer information than the atomic fields (date ranges and date ranges which span more than one year). A consumer who wants to simply obtain the year from dwc:year does so at their own peril if dwc:eventDate contains a date range which spans more than one year. That, however, is a let the consumer beware kind of issue. We shouldn't advocate back populating because people may want to use the data, as it potentially is unfit for their use, but I'm entirely in favor of back populating in order to make data sets consistent in their presentation - filling in all fields that can be filled in.

chicoreus avatar Feb 01 '18 20:02 chicoreus

I propose changing the description to: One or more empty component terms of the dwc:Event class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear) have been filled in from a valid value in the term dwc:eventDate.

The concept reflected in in the note " and only if dwc:eventDate has a valid ISO 8601:2004(E) date" should be reflected in the description. It is an important point for implementers.

chicoreus avatar Feb 01 '18 20:02 chicoreus

There is an inter-amendment workflow dependency to note here - this amendment should run after all other amendments that may affect the value of eventDate. (i.e. fill in the event date from the verbatimEventDate, then fill in year/month/day/startDayOfYear/endDayOfYear from the interpreted eventDate value, etc.). I've added a note to this effect to the prerequisites.

chicoreus avatar Feb 01 '18 20:02 chicoreus

We should note that time is not included here as it is not considered core. In working with an implementation of this, I've found extracting time from eventDate to be fraught with all sorts of concerns (which aren't going to be core concerns), including handling time zone and handling times on eventDates which involve ranges. Non-trivial to specify appropriate behaviors, and those aren't core. We are much safer not having this particular test propose to fill in eventTime.

chicoreus avatar Feb 01 '18 20:02 chicoreus

We do need to specify if endDayOfYear is expected to be filled in if the eventDate represents a single day.

chicoreus avatar Feb 01 '18 21:02 chicoreus

@chicoreus - wrt to workflow - I have added "After #33, #49, #86, #93, #132" for test #52 - see the circulated workflow document

ArthurChapman avatar Feb 01 '18 22:02 ArthurChapman

@chicoreus - fully agree wrt note on time.

ArthurChapman avatar Feb 01 '18 22:02 ArthurChapman

I have reviewed the parameters and notes (all good) and believe we have a useable outcome.

Tasilee avatar Feb 02 '18 22:02 Tasilee

See comment in #131 about whether this test should specify not filling in endDayOfYear in cases where eventDate is a range of years.

chicoreus avatar Aug 29 '19 16:08 chicoreus

@chicoreus. In discussion in Gainesville, I think I said I saw less value in filling all these other fields from eventDate than going the other way and making sure eventDate was filled in wherever possible. So, filling in endDayOfYear etc. I see being of lower value, and I would be happy with your suggestion. However, those that advocated for this test should be better commenting on it than I.

ArthurChapman avatar Aug 29 '19 21:08 ArthurChapman

Reading your comment on #131, I agree with your logic @chicoreus. How would you work the Expected Response?

INTERNAL_PREREQUESITES_NOT_MET if the field dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; AMENDED if one or more EMPTY terms of the dwc:Event class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear) have been filled in from a valid unambiguously interpretable value in dwc:eventDate, unless dwc:eventDate spans years in which case dwc:endDayOfyear is not FILLED_IN; otherwise NOT_CHANGED?

Tasilee avatar Aug 29 '19 22:08 Tasilee

@Tasilee. We don't have a term for "NOT_FILLED_IN so I would say

INTERNAL_PREREQUESITES_NOT_MET if the field dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; AMENDED if one or more EMPTY terms of the dwc:Event class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear) have been filled in from a valid unambiguously interpretable value in dwc:eventDate and eventDate is wholly within the one year; otherwise, or if dwc:eventDate spans more than one year, NOT_CHANGED?

ArthurChapman avatar Aug 29 '19 23:08 ArthurChapman

@ArthurChapman: Much better. Editing.

Tasilee avatar Aug 29 '19 23:08 Tasilee

Or, perhaps even simpler - putting in into positive and being consistent

INTERNAL_PREREQUESITES_NOT_MET if the field dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; AMENDED if one or more EMPTY terms of the dwc:Event class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear) have been filled in from a valid unambiguously interpretable value in dwc:eventDate and eventDate is wholly within the one year; otherwise NOT_CHANGED

ArthurChapman avatar Aug 29 '19 23:08 ArthurChapman

@ArthurChapman: Yep

Tasilee avatar Aug 29 '19 23:08 Tasilee

Dealing with timezones is a real issue faced when integrating and accessing data. Yes, it is difficult but clarifying expected behavior when a) data spans timezones, b) dealing with datasets holding date times with and without timezone and c) when the timezone of the consumer is not known would be helpful. It might result in recommendations to store a localDateTime version as well as a UTC-normalized version.

This should also take into consideration the expectations of simple human observations (e.g. a naturalist record from 07:30 local time on 1st January 2019 in New Zealand being returned in a 2019 search and not a 2018 search) as well as those of machine recorders where high-frequency samples are taken and stored in UTC regardless of location even though the moving organism is crossing timezones.

timrobertson100 avatar Oct 29 '19 13:10 timrobertson100

@timrobertson100 discussion in TG2 call today: (1) time was decided as out of scope for the TG2 tests early on, and we'd have to add in all the complexities of time to address the timezone concern. (2) The tests, in particular this one, are asking about the representation of date in a single record, and don't involve comparisons between dates in different records. If dwc:eventTime is included in the New Zealand record you discuss, a consumer of the data is able to interpret which year to place that record into for purposes of search, we are seeing this as an independent question from the internal consistency of the terms in the record itself.

chicoreus avatar Mar 31 '20 22:03 chicoreus

Discussion in call and @tucotuco 's observation that Darwin Core is vauge on whether dwc:endDayOfYear is tied to the end of a date range, and the thought that we could put a stake in the ground towards expectations for this case of endDayOfYear meaning the day of the year of the end of a range expressed in eventDate. Thus, @Tasilee here's a proposal for changing the specification of this test:

INTERNAL_PREREQUESITES_NOT_MET if the field dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; AMENDED if one or more EMPTY terms of the dwc:Event class (dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear) have been filled in from a valid unambiguously interpretable value in dwc:eventDate; otherwise NOT_CHANGED

And Note: Only fields that are empty will be amended, and only if dwc:eventDate has a valid ISO 8601-1:2019 date. The dwc:eventDate is the canonical form of the event date (it is the first trusted form). If event date does not contain a range, dwc:startDayOfYear = dwc:endDayOfYear. Time (as compared to date) is not deemed a CORE component. NB Run this amendment after any other amendment which may affect dwc:eventDate. If eventDate contains a date range, dwc:startDayOfYear is to be interpreted as the day of year of the start of the date range, and dwc:endDayOfYear is to be interpreted as the day of the year of the end of the date range, thus endDayOfYear could be smaller than startDayOfYear as in 2015-12-15/2016-01-15.

chicoreus avatar Mar 31 '20 22:03 chicoreus

Thanks @chicoreus. It is simpler but certainly needs the Note for clarification. Is everyone else happy with this before I amend?

In regards ISO 8601, we can reference it in the specs as discussed but I am tempted to add https://en.wikipedia.org/wiki/ISO_8601 to the references everywhere (in each test) that requires it. What do you think? Reasoning: We all agreed on the benefits of having our table as self-contained as possible (accepting a move from non-canonical to Notes) and access to details of ISO standards is not a simple end-point or free.

Tasilee avatar Apr 01 '20 00:04 Tasilee

Also worth referencing EDTF https://www.loc.gov/standards/datetime/ which is incorporated into ISO:8601-2 (2019), and looking for somewhere with a good summary of 8601/8601-2 to point people at.

chicoreus avatar Apr 01 '20 00:04 chicoreus

Agreed. I have added these refs here, but assume they need to go everywhere we have a TIME tab?

Tasilee avatar Apr 01 '20 05:04 Tasilee

Changed "AMENDED" to "FILLED_IN" in accordance with discussions April 16.

Tasilee avatar Apr 18 '22 22:04 Tasilee

Given recent discussions, I have changed

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or contains an invalid value according to bdq:sourceAuthority; FILLED_IN one or more EMPTY terms dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear if they can be unambiguously interpreted from values in dwc:eventDate, and dwc:eventDate is wholly within one year; otherwise NOT_AMENDED

to

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or contains an invalid ISO 8601-1 date; FILLED_IN one or more EMPTY terms dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear if they can be unambiguously interpreted from values in dwc:eventDate, and dwc:eventDate is wholly within one year; otherwise NOT_AMENDED

...and I have removed the ref to bdq:sourceAuthority.

Tasilee avatar Mar 10 '23 01:03 Tasilee

This may be too picky, but I would reword to:

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or contains a value that is not valid according to ISO 8601-1; FILLED_IN one or more EMPTY terms dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear if they can be unambiguously interpreted from values in dwc:eventDate, and dwc:eventDate is wholly within one year; otherwise NOT_AMENDED

tucotuco avatar Mar 10 '23 18:03 tucotuco

I like that wording @tucotuco

ArthurChapman avatar Mar 10 '23 20:03 ArthurChapman

Given comments from @tucotuco, then maybe

INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or contains a value that is not valid according to ISO 8601-1:2019; FILLED_IN one or more EMPTY terms dwc:year, dwc:month, dwc:day, dwc:startDayOfYear, dwc:endDayOfYear if they can be unambiguously interpreted from values in dwc:eventDate, and dwc:eventDate is wholly within one year; otherwise NOT_AMENDED

I have changed it to this for now and will align the other occurrences if there is agreement.

Tasilee avatar Mar 11 '23 01:03 Tasilee