bdq TG2-ISSUE_OUTLIER

TestField Value

GUID b638bde2-5de4-4046-8a60-57bd306cd2cc

Label ISSUE_OUTLIER_DETECTED

Description Is the record is an outlier when compared with one or more environmental variables using all available records of that taxon?

TestType Issue

Darwin Core Class Occurrence

Information Elements ActedUpon dwc:scientificName

dwc:decimalLatitude

dwc:decimalLongitude

Information Elements Consulted

Expected Response [TO BE DETERMINED.]

Data Quality Dimension Conformance

Term-Actions OUTLIER_DETECTED

Parameter(s) bdq:sourceAuthority

Source Authority bdq:sourceAuthority default = [TO BE DETERMINED]

Specification Last Updated 2024-02-11

Examples [dwc:scientificName="Eucalyptus globulus", dwc:decimalLatitude="-20.55", dwc:decimalLongitude="125.64": Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="The record is an outlier when compared with one or more environmental variables using all available records of that taxon - mean annual temperature is 27.5c which is 6.8c higher than maximum observed for taxon"]

[dwc:scientificName="Eucalyptus globulus", dwc:decimalLatitude="-36.9593", dwc:decimalLongitude="146.5138" Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="The record is not an outlier when compared with one or more environmental variables using all available records of that taxon"]

Source CRIA, ALA

References
Chapman, A.D. (1992). Quality control and validation of environmental resource data in Quality control and validation of environmental resource data Canberra: Commonwealth Land Information Forum pp. 1-16 [also published electronically at: https://www.researchgate.net/publication/332537824

Chapman AD (1999). Quality control and validation of point-sourced environmental resource data pp. 409-418 in Lowell, K. and Jaton, A. (eds). Spatial Accuracy Assessment: Land Information Uncertainty in Natural Resources. Chelsea, Michigan: Ann Arbor Press. 455pp

Chapman, A.D., Hijmans, R., Marino, A, De Giovanni, R. and de Souza, S. (2006). Using the concept of “Outlierness” to identify suspect records in Primary Species Occurrence Data p. 39 in The Road to Productive Partnerships. The 21st Annual Meeting of the Society for the Preservation of Natural History Collections and the Natural Science Collections Alliance 2006 Annual Meeting. Program & Abstracts. Albuquerque, New Mexico 23-27. May 2006. https://www.researchgate.net/publication/333198103_Using_the_concept_of_Outlierness_to_identify_suspect_records_in_Primary_Species_Occurrence_Data

Example Implementations (Mechanisms)

Link to Specification Source Code

Notes Outliers can be detected by a range of methods. One method is to use multiple records and a tool such as Reverse Jackknifing, which would place this test as bdq:DO NOT IMPLEMENT. Other methods for detecting outliers such as identifying an occurrence is outside an 'expert spatial distribution' is another outlier method (see #292).

TestField	Value
GUID	b638bde2-5de4-4046-8a60-57bd306cd2cc
Label	ISSUE_OUTLIER_DETECTED
Description	Is the record is an outlier when compared with one or more environmental variables using all available records of that taxon?
TestType	Issue
Darwin Core Class	Occurrence
Information Elements ActedUpon	dwc:scientificName
	dwc:decimalLatitude
	dwc:decimalLongitude
Information Elements Consulted
Expected Response	[TO BE DETERMINED.]
Data Quality Dimension	Conformance
Term-Actions	OUTLIER_DETECTED
Parameter(s)	bdq:sourceAuthority
Source Authority	bdq:sourceAuthority default = [TO BE DETERMINED]
Specification Last Updated	2024-02-11
Examples	[dwc:scientificName="Eucalyptus globulus", dwc:decimalLatitude="-20.55", dwc:decimalLongitude="125.64": Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="The record is an outlier when compared with one or more environmental variables using all available records of that taxon - mean annual temperature is 27.5c which is 6.8c higher than maximum observed for taxon"]
	[dwc:scientificName="Eucalyptus globulus", dwc:decimalLatitude="-36.9593", dwc:decimalLongitude="146.5138" Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="The record is not an outlier when compared with one or more environmental variables using all available records of that taxon"]

Source	CRIA, ALA
References	Chapman, A.D. (1992). Quality control and validation of environmental resource data in Quality control and validation of environmental resource data Canberra: Commonwealth Land Information Forum pp. 1-16 [also published electronically at: https://www.researchgate.net/publication/332537824 Chapman AD (1999). Quality control and validation of point-sourced environmental resource data pp. 409-418 in Lowell, K. and Jaton, A. (eds). Spatial Accuracy Assessment: Land Information Uncertainty in Natural Resources. Chelsea, Michigan: Ann Arbor Press. 455pp Chapman, A.D., Hijmans, R., Marino, A, De Giovanni, R. and de Souza, S. (2006). Using the concept of “Outlierness” to identify suspect records in Primary Species Occurrence Data p. 39 in The Road to Productive Partnerships. The 21st Annual Meeting of the Society for the Preservation of Natural History Collections and the Natural Science Collections Alliance 2006 Annual Meeting. Program & Abstracts. Albuquerque, New Mexico 23-27. May 2006. https://www.researchgate.net/publication/333198103_Using_the_concept_of_Outlierness_to_identify_suspect_records_in_Primary_Species_Occurrence_Data
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes	Outliers can be detected by a range of methods. One method is to use multiple records and a tool such as Reverse Jackknifing, which would place this test as bdq:DO NOT IMPLEMENT. Other methods for detecting outliers such as identifying an occurrence is outside an 'expert spatial distribution' is another outlier method (see #292).

Feb 10 '24 21:02 ArthurChapman

This could be an ISSUE test rather than a VALIDATION?

Feb 10 '24 21:02 ArthurChapman

Changes test from a VALIDATION test to an ISSUE test

Feb 10 '24 21:02 ArthurChapman

I've always found this one interesting. It isn't a multi record test. As it stands, 'outlier' could refer to SPACE or TIME or less likely, NAME. It was originally suggested for SPACE and I'd still push that, but still Supplementary. With increasing species observations, 'expert distributions' can be built nand used to validate an Occurence. The ALA does this. Admitted, depending on 'mobility', climate change may need consideration.

Feb 10 '24 23:02 Tasilee

I've edited the Notes.

Feb 11 '24 21:02 Tasilee

Added a few references.

Feb 11 '24 23:02 ArthurChapman

In the absence of an example implementation I propose deletion of this issue. Too many abstract external variables, and very likely significant work required to spell out the specification.

As we consider dwc:scientificNameID as the key term for the Taxon class, this test should use scientificNameID not scientificName as an information element.

Implementation needs to consider georeference metadata, latitude and longitude are insufficient for assessment.

Implementors will very likely wish to parameterize this test to identify what counts as an outlier.

This test would very likely need to use the extension point in the response for representing uncertainty, given the variability of certainty about the inferred species distribution for different taxa.

Feb 12 '24 19:02 chicoreus

This is something that could have real utility if well defined. It should not go into do not implement. It is premature to define this test, too many variables need very careful consideration.

Feb 12 '24 19:02 chicoreus

@chicoreus - very difficult to define - but could be a valuable test further down the line. I suggested DO NOT IMPLEMENT at this stage as we don't have an Expected Response, but don't have objection to Supplementary. Not prepared to put time into writing an Expected Response at this stage, but doesn't stop someone doing it down the line.

Feb 12 '24 20:02 ArthurChapman

@ArthurChapman DO NOT IMPLEMENT is our marker for tests that should not be implemented because we found problems that make them not possible to implement or non-useful. This one is useful, but would need very substantive work to define the test in a useful way for others. Supplemental doesn't fit that, as those are tests that we are providing what we believe are reasonably mature specifications for (which make a lot of sense for the many recent supplementary tests that assess emptyness, as for the most part these are straightforward), but this fits poorly to this test and others where we think the test might be useful, but very substantial work, likely involving thought, implementation, development of validation data, assessment of the implementation against the validation data, rethinking the test, and refining the specification are all needed. For this set of tests I am proposing we simply delete the issues, though we've now got enough discussion to merit keeping them but marking them in some way as immature specifications (and not including in supplemental, but leaving them as open immature issues).

Feb 12 '24 21:02 chicoreus

Agree @chicoreus - I suggested DO NOT IMPLEMENT because we have problems with implementation because we don't currently have an Expected Response. At a later date, if someone writes an Expected Response, it could be changed to Supplementary. Otherwise, I guess it could be labelled Supplementary with NEEDS WORK.

Feb 12 '24 21:02 ArthurChapman

Thanks @arthur, but I think Immature/Incomplete is equivalent to NEEDS WORK. We have been using NEEDS WORK where we (TG2) need to get to a conclusion.

Feb 18 '24 22:02 Tasilee

NEEDS WORK can also apply to tests that aren't Immature/Incomplete (and other Issues that aren't tests) for a number of reasons, but ALL Immature/Incomplete do NEED WORK,

Feb 18 '24 22:02 ArthurChapman

bdq
bdq copied to clipboard

TG2-ISSUE_OUTLIER_DETECTED

bdq bdq copied to clipboard

TG2-ISSUE_OUTLIER_DETECTED

bdq
bdq copied to clipboard