bdq
bdq copied to clipboard
TG2-ISSUE_OUTLIER_DETECTED
TestField | Value |
---|---|
GUID | b638bde2-5de4-4046-8a60-57bd306cd2cc |
Label | ISSUE_OUTLIER_DETECTED |
Description | Is the record is an outlier when compared with one or more environmental variables using all available records of that taxon? |
TestType | Issue |
Darwin Core Class | Occurrence |
Information Elements ActedUpon | dwc:scientificName |
dwc:decimalLatitude | |
dwc:decimalLongitude | |
Information Elements Consulted | |
Expected Response | [TO BE DETERMINED.] |
Data Quality Dimension | Conformance |
Term-Actions | OUTLIER_DETECTED |
Parameter(s) | bdq:sourceAuthority |
Source Authority | bdq:sourceAuthority default = [TO BE DETERMINED] |
Specification Last Updated | 2024-02-11 |
Examples | [dwc:scientificName="Eucalyptus globulus", dwc:decimalLatitude="-20.55", dwc:decimalLongitude="125.64": Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="The record is an outlier when compared with one or more environmental variables using all available records of that taxon - mean annual temperature is 27.5c which is 6.8c higher than maximum observed for taxon"] |
[dwc:scientificName="Eucalyptus globulus", dwc:decimalLatitude="-36.9593", dwc:decimalLongitude="146.5138" Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="The record is not an outlier when compared with one or more environmental variables using all available records of that taxon"] | |
Source | CRIA, ALA |
References |
|
Example Implementations (Mechanisms) | |
Link to Specification Source Code | |
Notes | Outliers can be detected by a range of methods. One method is to use multiple records and a tool such as Reverse Jackknifing, which would place this test as bdq:DO NOT IMPLEMENT. Other methods for detecting outliers such as identifying an occurrence is outside an 'expert spatial distribution' is another outlier method (see #292). |
This could be an ISSUE test rather than a VALIDATION?
Changes test from a VALIDATION test to an ISSUE test
I've always found this one interesting. It isn't a multi record test. As it stands, 'outlier' could refer to SPACE or TIME or less likely, NAME. It was originally suggested for SPACE and I'd still push that, but still Supplementary. With increasing species observations, 'expert distributions' can be built nand used to validate an Occurence. The ALA does this. Admitted, depending on 'mobility', climate change may need consideration.
I've edited the Notes.
Added a few references.
In the absence of an example implementation I propose deletion of this issue. Too many abstract external variables, and very likely significant work required to spell out the specification.
As we consider dwc:scientificNameID as the key term for the Taxon class, this test should use scientificNameID not scientificName as an information element.
Implementation needs to consider georeference metadata, latitude and longitude are insufficient for assessment.
Implementors will very likely wish to parameterize this test to identify what counts as an outlier.
This test would very likely need to use the extension point in the response for representing uncertainty, given the variability of certainty about the inferred species distribution for different taxa.
This is something that could have real utility if well defined. It should not go into do not implement. It is premature to define this test, too many variables need very careful consideration.
@chicoreus - very difficult to define - but could be a valuable test further down the line. I suggested DO NOT IMPLEMENT at this stage as we don't have an Expected Response, but don't have objection to Supplementary. Not prepared to put time into writing an Expected Response at this stage, but doesn't stop someone doing it down the line.
@ArthurChapman DO NOT IMPLEMENT is our marker for tests that should not be implemented because we found problems that make them not possible to implement or non-useful. This one is useful, but would need very substantive work to define the test in a useful way for others. Supplemental doesn't fit that, as those are tests that we are providing what we believe are reasonably mature specifications for (which make a lot of sense for the many recent supplementary tests that assess emptyness, as for the most part these are straightforward), but this fits poorly to this test and others where we think the test might be useful, but very substantial work, likely involving thought, implementation, development of validation data, assessment of the implementation against the validation data, rethinking the test, and refining the specification are all needed. For this set of tests I am proposing we simply delete the issues, though we've now got enough discussion to merit keeping them but marking them in some way as immature specifications (and not including in supplemental, but leaving them as open immature issues).
Agree @chicoreus - I suggested DO NOT IMPLEMENT because we have problems with implementation because we don't currently have an Expected Response. At a later date, if someone writes an Expected Response, it could be changed to Supplementary. Otherwise, I guess it could be labelled Supplementary with NEEDS WORK.
Thanks @arthur, but I think Immature/Incomplete is equivalent to NEEDS WORK. We have been using NEEDS WORK where we (TG2) need to get to a conclusion.
NEEDS WORK can also apply to tests that aren't Immature/Incomplete (and other Issues that aren't tests) for a number of reasons, but ALL Immature/Incomplete do NEED WORK,