bdq icon indicating copy to clipboard operation
bdq copied to clipboard

TG2-ISSUE_OUTLIER_DETECTED

Open ArthurChapman opened this issue 1 year ago • 12 comments

TestField Value
GUID b638bde2-5de4-4046-8a60-57bd306cd2cc
Label ISSUE_OUTLIER_DETECTED
Description Is the record is an outlier when compared with one or more environmental variables using all available records of that taxon?
TestType Issue
Darwin Core Class Occurrence
Information Elements ActedUpon dwc:scientificName
dwc:decimalLatitude
dwc:decimalLongitude
Information Elements Consulted
Expected Response [TO BE DETERMINED.]
Data Quality Dimension Conformance
Term-Actions OUTLIER_DETECTED
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = [TO BE DETERMINED]
Specification Last Updated 2024-02-11
Examples [dwc:scientificName="Eucalyptus globulus", dwc:decimalLatitude="-20.55", dwc:decimalLongitude="125.64": Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="The record is an outlier when compared with one or more environmental variables using all available records of that taxon - mean annual temperature is 27.5c which is 6.8c higher than maximum observed for taxon"]
[dwc:scientificName="Eucalyptus globulus", dwc:decimalLatitude="-36.9593", dwc:decimalLongitude="146.5138" Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="The record is not an outlier when compared with one or more environmental variables using all available records of that taxon"]
Source CRIA, ALA
References
  • Chapman, A.D. (1992). Quality control and validation of environmental resource data in Quality control and validation of environmental resource data Canberra: Commonwealth Land Information Forum pp. 1-16 [also published electronically at: https://www.researchgate.net/publication/332537824
  • Chapman AD (1999). Quality control and validation of point-sourced environmental resource data pp. 409-418 in Lowell, K. and Jaton, A. (eds). Spatial Accuracy Assessment: Land Information Uncertainty in Natural Resources. Chelsea, Michigan: Ann Arbor Press. 455pp
  • Chapman, A.D., Hijmans, R., Marino, A, De Giovanni, R. and de Souza, S. (2006). Using the concept of “Outlierness” to identify suspect records in Primary Species Occurrence Data p. 39 in The Road to Productive Partnerships. The 21st Annual Meeting of the Society for the Preservation of Natural History Collections and the Natural Science Collections Alliance 2006 Annual Meeting. Program & Abstracts. Albuquerque, New Mexico 23-27. May 2006. https://www.researchgate.net/publication/333198103_Using_the_concept_of_Outlierness_to_identify_suspect_records_in_Primary_Species_Occurrence_Data
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes Outliers can be detected by a range of methods. One method is to use multiple records and a tool such as Reverse Jackknifing, which would place this test as bdq:DO NOT IMPLEMENT. Other methods for detecting outliers such as identifying an occurrence is outside an 'expert spatial distribution' is another outlier method (see #292).

ArthurChapman avatar Feb 10 '24 21:02 ArthurChapman

This could be an ISSUE test rather than a VALIDATION?

ArthurChapman avatar Feb 10 '24 21:02 ArthurChapman

Changes test from a VALIDATION test to an ISSUE test

ArthurChapman avatar Feb 10 '24 21:02 ArthurChapman

I've always found this one interesting. It isn't a multi record test. As it stands, 'outlier' could refer to SPACE or TIME or less likely, NAME. It was originally suggested for SPACE and I'd still push that, but still Supplementary. With increasing species observations, 'expert distributions' can be built nand used to validate an Occurence. The ALA does this. Admitted, depending on 'mobility', climate change may need consideration.

Tasilee avatar Feb 10 '24 23:02 Tasilee

I've edited the Notes.

Tasilee avatar Feb 11 '24 21:02 Tasilee

Added a few references.

ArthurChapman avatar Feb 11 '24 23:02 ArthurChapman

In the absence of an example implementation I propose deletion of this issue. Too many abstract external variables, and very likely significant work required to spell out the specification.

As we consider dwc:scientificNameID as the key term for the Taxon class, this test should use scientificNameID not scientificName as an information element.

Implementation needs to consider georeference metadata, latitude and longitude are insufficient for assessment.

Implementors will very likely wish to parameterize this test to identify what counts as an outlier.

This test would very likely need to use the extension point in the response for representing uncertainty, given the variability of certainty about the inferred species distribution for different taxa.

chicoreus avatar Feb 12 '24 19:02 chicoreus

This is something that could have real utility if well defined. It should not go into do not implement. It is premature to define this test, too many variables need very careful consideration.

chicoreus avatar Feb 12 '24 19:02 chicoreus

@chicoreus - very difficult to define - but could be a valuable test further down the line. I suggested DO NOT IMPLEMENT at this stage as we don't have an Expected Response, but don't have objection to Supplementary. Not prepared to put time into writing an Expected Response at this stage, but doesn't stop someone doing it down the line.

ArthurChapman avatar Feb 12 '24 20:02 ArthurChapman

@ArthurChapman DO NOT IMPLEMENT is our marker for tests that should not be implemented because we found problems that make them not possible to implement or non-useful. This one is useful, but would need very substantive work to define the test in a useful way for others. Supplemental doesn't fit that, as those are tests that we are providing what we believe are reasonably mature specifications for (which make a lot of sense for the many recent supplementary tests that assess emptyness, as for the most part these are straightforward), but this fits poorly to this test and others where we think the test might be useful, but very substantial work, likely involving thought, implementation, development of validation data, assessment of the implementation against the validation data, rethinking the test, and refining the specification are all needed. For this set of tests I am proposing we simply delete the issues, though we've now got enough discussion to merit keeping them but marking them in some way as immature specifications (and not including in supplemental, but leaving them as open immature issues).

chicoreus avatar Feb 12 '24 21:02 chicoreus

Agree @chicoreus - I suggested DO NOT IMPLEMENT because we have problems with implementation because we don't currently have an Expected Response. At a later date, if someone writes an Expected Response, it could be changed to Supplementary. Otherwise, I guess it could be labelled Supplementary with NEEDS WORK.

ArthurChapman avatar Feb 12 '24 21:02 ArthurChapman

Thanks @arthur, but I think Immature/Incomplete is equivalent to NEEDS WORK. We have been using NEEDS WORK where we (TG2) need to get to a conclusion.

Tasilee avatar Feb 18 '24 22:02 Tasilee

NEEDS WORK can also apply to tests that aren't Immature/Incomplete (and other Issues that aren't tests) for a number of reasons, but ALL Immature/Incomplete do NEED WORK,

ArthurChapman avatar Feb 18 '24 22:02 ArthurChapman