bdq
bdq copied to clipboard
TG2-MEASURE_EVENTDATE_DURATIONINSECONDS
TestField | Value |
---|---|
GUID | 56b6c695-adf1-418e-95d2-da04cad7be53 |
Label | MEASURE_EVENTDATE_DURATIONINSECONDS |
Description | What is the duration of dwc:eventDate in seconds? |
TestType | Measure |
Darwin Core Class | dwc:Event |
Information Elements ActedUpon | dwc:eventDate |
Information Elements Consulted | |
Expected Response | INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is bdq:Empty or if the value of dwc:eventDate is not a valid ISO 8601 date; otherwise RUN_HAS_RESULT with the result being the duration (sensu ISO 8601) expressed in the dwc:eventDate, in seconds. |
Data Quality Dimension | Resolution |
Term-Actions | EVENTDATE_DURATIONINSECONDS |
Parameter(s) | |
Source Authority | |
Specification Last Updated | 2024-09-16 |
Examples | [dwc:eventDate="1880-05-08/10": Response.status=RUN_HAS_RESULT, Response.result="259200", Response.comment="dwc:eventDate duration is 3 days = 259,200 seconds"] |
[dwc:eventDate="95": Response.status=INTERNAL_PREREQUISITES_NOT_MET, Response.result=NOT_REPORTED, Response.comment="dwc:eventDate does not contain a valid ISO 8601-1:2019 date"] | |
Source | Alex Thompson |
References |
|
Example Implementations (Mechanisms) | Kurator/FilteredPush event_date_qc Library 10.5281/zenodo.596795. |
Link to Specification Source Code | event_date_qc v3.0.5 DwCEventDQ.measureEventdateDurationinseconds() |
Notes | The duration of a day is 86400 seconds. Implementations should treat all days as 86400 seconds, but should include leap days (but not leap seconds) in durations that encompass them. Consumers should treat assertions about event date duration as approximations, see: https://xkcd.com/2867/ |
Isn't this a duplicate of #124?
Information elements should only include eventDate.
Remove the " expected to run this test after the amendments to populate dwc:eventDate." All measures are expected to be run in a pre-amendment phase and in a post-amendment phase.
Should this one be better named MEASURE_EVENTDATE_PRECISIONINSECONDS?
Yes indeed. Changed in label and table, where Term-Actions was also incorrect.
This test has no path to a response state of NOT_RUN, so the last clause shouldn't be included in the specification, similarly, "REPORTED" is confusing, this means Result.State=RUN_HAS_RESULT Result.Value={duration in seconds, long integer}.
Suggest changing: "INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; REPORT on the length of the period expressed in the dwc:eventDate in seconds; otherwise NOT_REPORTED" to:
INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; otherwise RUN_HAS_RESULT with the result value being the length of the period expressed in the dwc:eventDate in seconds"
@chicoreus - the suggested change looks OK to me.
@ArthurChapman great. I've updated accordingly.
I've also updated the TESTs dump worksheet accordingly and I think I have already done the test data examples correctly.
We probably need to add guidance about leap seconds to the notes. For the purposes of CORE, I would suggest that the guidance be to ignore leap seconds when calculating duration. Standard time libraries that implementers would use tend to ignore leap seconds (e.g. posix seconds from the epoch (affecting python Time), joda time (which excludes leap seconds), and java Time (which treats all days as 86400 seconds), etc.). Including leap seconds is likely to confuse end users, for example by causing event dates known to a day that includes a leap second to be excluded from use by a filter on <=86400 seconds). CORE uses, from the TG3 results and discussed here where we discussed including dwc:eventTime and decided not to include it in core, don't include time, so feels safer for core uses to be explicit in the notes about not including leap seconds.
I propose changing the notes from:
The length of a day is 86400 seconds. Leap days and leap seconds affect the duration of some months and some years.
to:
The length of a day is 86400 seconds. Leap days and leap seconds affect the duration of some months and some years, implementations should not include leap seconds, and treat all days as 86400 seconds, but should include leap days.
I am happy with that.
I don't understand it. Could it be reworded for simpletons like me? It needs to be self contained or reference definitions.
Would this suffice?
The length of a day is 86400 seconds. Implementations should treat all days as 86400 seconds, but should include leap days.
Thanks @tucotuco. Better, but we have two elephants in the room!
- We have no dwc:coordinatePrecision test so why this one? I do wonder.
- Are we REALLY talking about a RANGE or a PRECISION? The expected response reads
..."RUN_HAS_RESULT with the result value being the length of the period expressed in the dwc:eventDate in seconds"
This implies a RANGE, yet the example
"dwc:eventDate="1880-05-08" has a precision of 86400 seconds. dwc:eventDate="1880-05-08/10" has a precision of 86400 seconds."
IS a PRECISION!
If this test is about PRECISION, and we think it is CORE, then surely the Expected Response should be more like
INTERNAL_PREREQUISITES_NOT_MET if dwc:eventDate is EMPTY or does not contain a valid ISO 8601-1:2019 date; otherwise RUN_HAS_RESULT with the value in seconds of the coarsest temporal unit of dwc:eventDate.
For example, dwc:eventDate="1949-01-15T12:34/1949-01-20" would have a result of 86400 seconds.
Leap anything is irrelevant?
Good point @Tasilee
I'd like to question why we have this test as a Core Test. It is unlike any other test we have. It is a measure unlike the other measures. I can't see how it would be valuable in testing the quality of a dataset, record etc. I am having trouble envisioning where a user would want to run this and what value they would get out of it wrt to data quality.
Very astute @Tasilee . That's thinking inside the box. ;-) I agree that precision is not particularly useful, but the uncertainty is, and that is what this measure is really about ("the result value being the length of the period expressed in the dwc:eventDate in seconds"). It would let people filter out records that were less specific than a day (> 86400), which is otherwise not trivial to do.
To fix this test, I would rename it to something like TG2-MEASTURE_EVENTDATE_DURATIONINSECONDS and amend the incorrect example,
dwc:eventDate="1880-05-08/10" has a precision of 86400 seconds.
to
dwc:eventDate="1880-05-08/10" has a precision of 259200 seconds.
@tucotuco yes, the duration of time expressed by the eventDate, providing users the ability to filter results to an arbitrary resolution of eventDate, was exactly what Alex originally proposed. This was instead of a set of validations for eventDate day or less, eventDate week or less, eventDate month or less, eventDate year or less, eventDate decade or less, etc, a single measure that reports on the duration of the event date in seconds enables filtering for a use (and a large set of core uses are concerned with the resolution of the eventDate) to the particular needs of a particular use. The event_date_qc implementation does this.
Renaming to MEASURE_EVENTDATE_DURATIONINSECONDS definetly clarifies.
I'd missed the 08/10 example, yes that should be 259200.
I'd suggest we use the word duration in the example as well:
dwc:eventDate="1880-05-08" has a duration of 86400 seconds. dwc:eventDate="1880-05-08/10" has a duration of 259200 seconds.
If we included leap seconds, and users filtered on resolution of one day or better, with a threshold of 86400 seconds to a day, they would exclude any occurrences with eventDate values that were the days that have leap seconds. Taking the position that all days have 86400 seconds simplifies filtering for them. The warning about leap days then is a warning that eventDate=1980 is 86400 seconds longer than eventDate=1981, and users filtering to the resolution of one year need to take the potential presence of leap days into account. If we implemented one of the proposed validations that this measure superceeded, validation eventDate duration one year or less, then we'd have to explicitly handle leap days in the definition of that validation (or any other similar validation, using this measure lets us pass of a decision on how to handle that to users in a way that bests fits their own needs.
@Tasilee we should add a MEASURE_COORDINATEUNCERTANTYMETERS as a similar measure to formalize filtering on coordinate uncertantiy within the framework, it should be a lot easier, just report the value of dwc:coordinateUncertantyInMeters if it is a number and if decimalLatitude and decimalLongitude are non-empty for a SingleRecord. Consumers of data quality test results would then have a formal description of a filtering attribute. It is trivial enought to describe that we missed it, but I think you are right, we need it.
I remain to be convinced that we are adding useful 'quality' information about a record based on the duration of an event. Duration is more likely to reflect the survey/observation type. While I am no field ecologist, I reckon I could make up a range of scenarios that had a range of durations that had no relationship with 'quality' of the record.
It is obvious to me that we could make a case for spatial precision and accuracy contributing to an aspect of 'data quality' but I can't fathom (sic) what contribution a spatial range would offer.
There may be a case for temporal precision.
@Tasilee we aren't adding quality information, we are making it much easier for users to filter records to fit their data quality needs. Phenology and related studies likely need dates resolvable to a duration of a day, coarser grained studies of seasonal changes may be satistifed with resolution to within a day, fine grained global change studies may be satisfied with data available to a resolution of a year, coarser grained perhaps to resolution to a decade. A measure that provides a standard form for the duration of an event date (in seconds) allows for users to easily select temporal data fit to their purpose, without having to deal with parsing all of the potential variants of ISO dates and date times that may be present and valid in dwc:eventDate, and without us having to guess at what thresholds of resolution will be important (other that saying that for core purposes, resolution to less than a second is not important).
If that is the reasoning @chicoreus (and I fully understand and agree with the reasoning) - then wouldn't it be better to add another Darwin Core term for temporalPrecision parallel to what we have for coordinate Precision?
@chicoreus - you still seem to be mixing the concept of resolution (course/fine) with duration. They are independent. As I said, I’d support a test of temporal resolution (and I agree with @ArthurChapman about a Darwin Core term for this) but duration makes far less sense to me. Let’s discuss on our Zoom meeting.
@Tasilee I agree we have to discuss it because I have exactly the opposite feeling.
I sincerely doubt that a bid to add a term for temporal precision would work for Darwin Core. That information is actually in the eventDate, unlike coordinatePrecision, which is not unambiguously detectable or coordinateUncertainty, which is not calculable from the coordinates.
Email discussion about use of bdq:precisionInSeconds and other terms/structure used in the test data. Any conclusions agreed?
The only real decision was that the output of a MEASURE is either a number, COMPLETE or NOT_COMPLETE. In all our tests the output is always a number. (COMPLETE and NOT_COMPLETE refer to MEASURES for multiple records only) Thus in the test data where we have bdq:precisionInSeconds="86400" we should just have "86400" etc.
I can't see anywhere else we need to make a change "bdq:precisionInSeconds" is not mentioned in the test anywhere, nor in the Vocabulary.
On Wed, 09 Mar 2022 20:30:11 -0800 Arthur Chapman @.***> wrote:
The only real decision was that the output of a MEASURE is either a number, COMPLETE or NOT_COMPLETE. In all our tests the output is always a number.
Exactly.
(COMPLETE and NOT_COMPLETE refer to MEASURES for multiple records only)
Under the framework, it is possible to formulate a Measure on a SingleRecord that returns COMPLETE or NOT_COMPLETE, we've been phrasing these as Validations instead (which is an entirely legimate alternative. For purposes of definition, we should be clear that it is possible to phrase a Measure in the form:
MEASURE_CORE_COMPLETENESS, Measure Completness SingleRecord for primary space/time/name core information. InformationElements: dwc:occurrenceID, dwc:taxonID, dwc:eventDate, dwc:decimalLatitude, dwc:decimalLongitude, dwc:geodeticDatum, dwc:coordinateUncertaintyInMeters. Specification: COMPLETE if each of dwc:occurrenceID, dwc:taxonID, dwc:eventDate, dwc:decimalLatitude, dwc:decimalLongitude, dwc:geodeticDatum, and dwc:coordinateUncertaintyInMeters contains a valid value; otherwise NOT_COMPLETE.
We'd probably express that as:
VALIDATION_CORETERMS_COMPLETE, Validation Completness SingleRecord for primary space/time/name core information. InformationElements: dwc:occurrenceID, dwc:taxonID, dwc:eventDate, dwc:decimalLatitude, dwc:decimalLongitude, dwc:geodeticDatum, dwc:coordinateUncertaintyInMeters. Specification: COMPLIANT if each of dwc:occurrenceID, dwc:taxonID, dwc:eventDate, dwc:decimalLatitude, dwc:decimalLongitude, dwc:geodeticDatum, and dwc:coordinateUncertaintyInMeters contains a valid value; otherwise NOT_COMPLIANT.
But, under the framework, both are valid ways to express the same concept, and for simplicity for end users, the way we are doing things with measures expressing only numeric values for SingleRecords, and using validations to test for completeness of information, is probably the better (and easier to understand for end users) choice.
I've applied @tucotuco 's 'The length of a day is 86400 seconds. Implementations should treat all days as 86400 seconds, but should include leap days." to the Notes.
@Tasilee @ArthurChapman also from various emails and discussion, as a generalization, in natural science collections data, the eventDate often represents a period of time where the collecting event occurred somewhere within this period of time, while for observational data, the eventDate may well represent a one second period where the observation covered the entire duration of this period of time, so precision, duration, period, uncertainty, and other terms we apply get muddled. In general, dwc:eventDate="1911" means that the occurrence occurred sometime within the year 1911, and for dwc:eventDate="1924-12-04", the occurrence occurred sometime within the day 1924-12-11, and for eventDate="2004-06-03T04:32:16Z", the occurrence covered a the specified second on 2004-06-03.
It is important for some use cases for eventDate data to be known to within a single day, thus 1911 is not fit for their purposes, but 1924-12-11 and 2004-06-03T04:32:16Z are. Other use cases can tolerate more uncertainty and 1911 may be fit, while 1910-01-01/1919-12-31 may not be. The measure of the number of seconds represented by the date/time value in dwc:eventDate provides for easy arbitrary filters for the acceptable length of time that can be present in an eventDate for a particular use case.
Now, we need to settle on what word to call this MEASURE_EVENTDATE_**********INSECONDS, and whether Resolution is the correct data quality dimenson for it.
(stepping in quietly here. I'd also wonder how often the date-with-time-stamp is a data export error in data type and not a "real" timestamp). I've seen use cases where the timestamp part really had no meaning but was simply an error either in the database, or in the export options that added a timestamp.
@debpaul yes, that is a good one, difficult to detect (except perhaps by seeing a difference between a time value present in dwc:eventDate, and a non-empty value in dwc:eventTime), and fortunately largely outside of the data quality needs expressed in the TG3 user scenarios - where for the most part knowing the date of the occurrence to within either a day or a year was fine. A case where this might be of importance is where a time value appened to a date includes a time zone that would cause other software to shift into the previous or next day, and the user's data quality needs are for, for example for a question around phenology, and data that is accurate to the day is important.
The TG3 analysis of user needs indicated that knowlege/accuracy/precision/etc of time more granular than a day isn't a core data quality need, thus TG2 has framed the CORE tests around years, months, and days, in dwc:eventDate, with none of the tests including dwc:eventTime as an information element. This rationale contributed to the guidance in this Measure to ignore leap seconds, they aren't important for CORE purposes (but someone else is free to define a Measure that includes them, if that serves some data quality need they have. Conversely, when @godfoder first proposed this measure, we were discussing a series of different tests of whether an event date fell into a single day, or a few days, or a week, or a month, or a year, or a decade, all looking a some specific quality need expressed in the TG3 user stories (and our experiences), and Alex suggested a single simple measure of how much time was represented by the dwc:eventDate as a generalization, fit for all of these, and for finer grained needs. Thus this of all the CORE tests stepped into the domain of time (for good engineering reasons).
@debpaul please do step in at any time. It is good to get more views into our tests and I am sure the four of us miss a lot of nuances. It is great to get more comments
@debpaul please do step in at any time. It is good to get more views into our tests and I am sure the four of us miss a lot of nuances. It is great to get more comments
Thanks @ArthurChapman thanks for the encouragement. I know sometimes I might not quite grok so I hesitate. Being invited and being welcome is wonderful - and of course, finding out my perception or example is useful - so much the better.