bdq icon indicating copy to clipboard operation
bdq copied to clipboard

TG2-VALIDATION_COUNTRYSTATEPROVINCE_UNAMBIGUOUS

Open Tasilee opened this issue 2 years ago • 15 comments

TestField Value
GUID d257eb98-27cb-48e5-8d3c-ab9fca4edd11
Label VALIDATION_COUNTRYSTATEPROVINCE_UNAMBIGUOUS
Description Is the combination of the values of the terms dwc:country, dwc:stateProvince unique in the bdq:sourceAuthority?
TestType Validation
Darwin Core Class dcterms:Location
Information Elements ActedUpon dwc:country
dwc:stateProvince
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if the terms dwc:country and dwc:stateProvince are bdq:Empty; COMPLIANT if the combination of values of dwc:country and dwc:stateProvince are unambiguously resolved to a single result with a child-parent relationship in the bdq:sourceAuthority and the entity matching the value of dwc:country in the bdq:sourceAuthority is an ISO 3166 country-like administrative entity in the bdq:sourceAuthority; otherwise NOT_COMPLIANT
Data Quality Dimension Conformance
Term-Actions COUNTRYSTATEPROVINCE_UNAMBIGUOUS
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "The Getty Thesaurus of Geographic Names (TGN)" {[https://www.getty.edu/research/tools/vocabularies/tgn/index.html]}
Specification Last Updated 2024-09-18
Examples [dwc:country="Argentina", dwc:stateProvince="Rio Negro": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:country and dwc:stateProvince are unambiguous"]
[dwc:country="", dwc:stateProvince="WA": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:country and dwc:stateProvince are ambiguous. Matches Western Australia, Washington State (US)"]
Source VertNet, Kurator
References
  • Chapman AD and Wieczorek JR (2020) Georeferencing Best Practices. Copenhagen: GBIF Secretariat. https://doi.org/10.15468/doc-gg7h-s853
  • Vertnet (2022) DwC Vocabs. https://github.com/VertNet/DwCVocabs/tree/master/vocabs
  • Getty Research Institute (2017) Getty Thesaurus of Geographic Names Online. https://www.getty.edu/research/tools/vocabularies/tgn/index.html
  • ISO (n.dat.) ISO 3166 Country Codes. https://www.iso.org/iso-3166-country-codes.html
  • ISO (n.dat) 3166-1 alpha-2. https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
Example Implementations (Mechanisms) Kurator
Link to Specification Source Code https://github.com/kurator-org/kurator-validation/blob/master/packages/kurator_dwca/workflows/dwca_geography_assessor.yaml
Notes See table https://github.com/tdwg/bdq/issues/95#issuecomment-1226450014. A fail condition may arise from the content being internally inconsistent (not all of the information can be true at the same time), or from the vocabulary being incapable of uniquely resolving the combination of term values. This test specifically does not consider the content of dwc:higherGeography. If dwc:country contains a value and dwc:stateProvince does not, this test will return NOT_COMPLIANT. Use cases where knowledge to the level of country is adequate for the fitness of the data should not include this test. @tucotuco: "Of #200 and #201, #201 is the strongest test. If it passes for a record, #200 must necessarily also pass and doesn't tell you anything. If #201 fails,#200 could still pass and that would tell you that there are multiple matches on the dwc:country/dwc:stateProvince combo: It would tell you the nature of the problem. Along with #42 (dwc:country not empty), #200 would tell you whether there was an ambiguous combination of country (not empty) and dwc:stateProvince, such as would happen with Argentina/Buenos Aires. While if country is empty, then the ambiguity is purely at the dwc:stateProvince level".

Tasilee avatar Aug 28 '22 23:08 Tasilee

Suggest modifying the Expected Response (changes in italics)

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if either of the terms dwc:country and dwc:stateProvince are EMPTY; COMPLIANT if the combination of values of dwc:country and dwc:stateProvince are unambiguously resolved in the bdq:sourceAuthority; otherwise NOT_COMPLIANT

ArthurChapman avatar Aug 28 '22 23:08 ArthurChapman

I don't think that is right. As per @tucotuco examples with #95, we are testing for ambiguity and one of the terms can be empty.

Tasilee avatar Aug 29 '22 00:08 Tasilee

I don't think that is right. As per @tucotuco examples with #95, we are testing for ambiguity and one of the terms can be empty.

I agree, it is correct as "INTERNAL_PREREQUISITES_NOT_MET if the terms dwc:country and dwc:stateProvince are EMPTY".

tucotuco avatar Sep 04 '22 14:09 tucotuco

How about:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if the terms dwc:country and dwc:stateProvince are EMPTY; COMPLIANT if the combination of values of dwc:country and dwc:stateProvince are unambiguously resolved to a single result with a child-parent relationship in the bdq:sourceAuthority and the entity matching the value of dwc:country in the bdq:sourceAuthority is an ISO country-like entity in the bdq:sourceAuthority; otherwise NOT_COMPLIANT

chicoreus avatar Sep 04 '22 21:09 chicoreus

This phrasing avoids a compliant result from missmapping of dwc:county onto stateProvince and stateProvince onto country, or instances where dwc:country and dwc:stateProvince are switched.

chicoreus avatar Sep 04 '22 21:09 chicoreus

Done

Tasilee avatar Sep 09 '22 01:09 Tasilee

Added to Notes: "This test will fail if there are leading or trailing white space or non-printing characters."

Tasilee avatar Sep 12 '22 02:09 Tasilee

In the Notes the Reference to "See table #95 (comment)" (i.e. "See table https://github.com/tdwg/bdq/issues/95#issuecomment-1226450014)" will need to be updated - but not sure how we can reference the comment

#95 can be changed to "VALIDATION_GEOGRAPHY_CONSISTENT (78640f09-8353-411a-800e-9b6d498fb1c9)" but the comment and table won't appear there without us putting it somewhere we can reference it.

ArthurChapman avatar Jun 13 '23 01:06 ArthurChapman

Updated Parameter(s) value to align with other tests

Tasilee avatar Jul 03 '23 23:07 Tasilee

Post Zoom 11/7/2023, I have aligned the Source Authority with the suggested syntax:

bdq:sourceAuthority default = "The Getty Thesaurus of Geographic Names (TGN)" [https://www.getty.edu/research/tools/vocabularies/tgn/index.html]

to

bdq:sourceAuthority default = "The Getty Thesaurus of Geographic Names (TGN)" {[https://www.getty.edu/research/tools/vocabularies/tgn/index.html]}

Tasilee avatar Jul 11 '23 02:07 Tasilee

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

Tasilee avatar Sep 18 '23 05:09 Tasilee

Removed inaplicable "fail" text from note. This is covered by unambigous in the specification, and leading/trailing whitespace should not block matches.

chicoreus avatar Feb 23 '24 20:02 chicoreus