bdq icon indicating copy to clipboard operation
bdq copied to clipboard

TG2-VALIDATION_TAXONRANK_STANDARD

Open ArthurChapman opened this issue 6 years ago • 35 comments

TestField Value
GUID 7bdb13a4-8a51-4ee5-be7f-20693fdb183e
Label VALIDATION_TAXONRANK_STANDARD
Description Does the value of dwc:taxonRank occur in the bdq:sourceAuthority?
TestType Validation
Darwin Core Class dwc:Taxon
Information Elements ActedUpon dwc:taxonRank
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonRank is bdq:Empty; COMPLIANT if the value of dwc:taxonRank is in the bdq:sourceAuthority; otherwise NOT_COMPLIANT.
Data Quality Dimension Conformance
Term-Actions TAXONRANK_STANDARD
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "GBIF TaxonRank Vocabulary" [https://api.gbif.org/v1/vocabularies/TaxonRank]} {"dwc:taxonRank vocabulary API" [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]}
Specification Last Updated 2023-09-18
Examples [dwc:taxonRank="kingdom": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:taxonRank has an equivalent in the bdq:sourceAuthority"]
[dwc:taxonRank="sp.": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:taxonRank does not have an equivalent in the bdq:sourceAuthority"]
Source TDWG2018
References
  • GBIF Registry (2023) GBIF Vocabulary: Taxonomic Rank. https://registry.gbif.org/vocabulary/TaxonRank/concepts
Example Implementations (Mechanisms) Kurator/FilteredPush sci_name_qc Library
Link to Specification Source Code https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L2165
Notes This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters.

ArthurChapman avatar Sep 03 '18 04:09 ArthurChapman

Added guid.

chicoreus avatar Sep 07 '18 15:09 chicoreus

Discussion in call: Parameters: bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml), when a default is not yet available at a location could be either:

Parameters = bdq:sourceAuthority

or

Parameters = bdq:sourceAuthority (default=[GBIF rank vocabulary])

In the first case discussion of there being a vocabulary to use, but not available at a stable IRI at this point would go into notes. In the second case this is implicit in the square brackets.

chicoreus avatar Mar 31 '20 21:03 chicoreus

And, when the vocabulary is at a stable location (e.g. at a DOI), use:

Parameter = bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml)

chicoreus avatar Mar 31 '20 21:03 chicoreus

Discussion, suggestion from @tucotuco , use Parameter = bdq:sourceAuthority, move defaults into notes (non-normative, for us to keep them in one place to work on them), add example table a suite of default parameters:

VALIDATION_TAXON_RANK_NOTSTANDARD, bdq:sourceAuthority default = http://rs.gbif.org/vocabulary/gbif/rank.xml.

Test definitions remain simple, normative, encapsulated.

Different implementors can easily use a set of default parameters as a separate document.

chicoreus avatar Mar 31 '20 22:03 chicoreus

Discussion, suggestion from @tucotuco , use Parameter = bdq:sourceAuthority, move defaults into notes (non-normative, for us to keep them in one place to work on them), add example table a suite of default parameters:

VALIDATION_TAXON_RANK_NOTSTANDARD, bdq:sourceAuthority default = http://rs.gbif.org/vocabulary/gbif/rank.xml.

Test definitions remain simple, normative, encapsulated.

Different implementors can easily use a set of default parameters as a separate document.

chicoreus avatar Mar 31 '20 22:03 chicoreus

@tucotuco suggests: TG2 Parameter Default Value Recommendation document, distinct from the tests.

chicoreus avatar Mar 31 '20 22:03 chicoreus

Suggested name of the separate document is "Test and Assertion Parameters"

ArthurChapman avatar Mar 31 '20 22:03 ArthurChapman

Or "Test Parameters"

ArthurChapman avatar Mar 31 '20 22:03 ArthurChapman

I think we need affiliation as in "TDWG DQ Test Parameters". I don't think it applies to the assertions as such?

Tasilee avatar Mar 31 '20 23:03 Tasilee

In doing the test data set for this one - I am not sure it would not work through synonymy. Looking at it now ssp., subsp. etc. would be NOT_COMPLIANT as it does not seem to be one of the accepted terms in https://rs.gbif.org/vocabulary/gbif/rank.xml - there are several accepted terms in different languages subspecies, subespecie, etc.

ArthurChapman avatar Oct 06 '20 04:10 ArthurChapman

The abbreviations would be rightfully NOT_COMPLIANT.

Tasilee avatar Oct 06 '20 21:10 Tasilee

That may be something that GBIF fixes later to include synonymy - which would then be fixed under an AMENDMENT

ArthurChapman avatar Oct 06 '20 21:10 ArthurChapman

That type of issue aligns with my conclusion about assumed, increasingly 'smarter' bdq:sourceAuthorities in handling variants - and this is what I remember @tucotuco suggesting regarding vocabs on values on one of our issues.

Tasilee avatar Oct 06 '20 23:10 Tasilee

This is my second definition - looking at a validation

| Description | Validation of the value in Taxon Rank for conformity with a value obtained from a Paramaterized Source Authority. If no parameter is set, the source authority defaults to the latest Taxonomic Rank GBIF Vocabulary.

ArthurChapman avatar Mar 20 '22 22:03 ArthurChapman

I'd suggest shorter:

| Definition | Validation that a provided value of Taxon Rank in a single record conforms with a specified controlled vocabulary |

chicoreus avatar Mar 20 '22 23:03 chicoreus

I would also try to stick with “bdq:sourceAuthority” everywhere. In this case a) it is Parameterized and b) is a vocabulary

Tasilee avatar Mar 21 '22 00:03 Tasilee

Following comments in #163 and #112, I suggest:

| Description | A test that checks if the value of dwc:taxonRank unambiguously conforms to the corresponding value provided from a specified bdq:sourceAuthority. I

I agree with @tucotuco that for these tests we do not need to add single record - that can be covered in the introductory test as all the validations and amendments cover only single records. If there are any that don't then that is the time to mention "multiple records"

ArthurChapman avatar Mar 21 '22 21:03 ArthurChapman

Curious if you plan to provide any concrete examples? (in Example Mechanisms?). I'm thinking like:

  1. As a curator, I sent this taxonomic name with dwc:taxonRank == [given rank empty]
  2. I get error of INTERNAL_PREREQUISITES_NOT_MET since dwc:taxonRank is EMPTY;
  3. Now what do I do?

OR

  1. In the dwc:taxonRank I used a "non-standard" value like {sp., spp., subsp., fam, etc}.
  2. The test returns
  • EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available;
    • otherwise NOT_COMPLIANT.
  1. Now what do I do?

OR

  1. I provide a valid value for dwc:taxonRank and get
  2. COMPLIANT if the value of dwc:taxonRank is in the bdq:sourceAuthority;
  3. in which case as a data provider I get a gold star?

debpaul avatar Mar 21 '22 21:03 debpaul

@ArthurChapman @tucotuco I will reiterate, single record is an essential part of these descriptions. We cannot omit it. We cannot include it in introductory text. Remember that for every validation that is single record, we must also generate a (trivially generated, which is why we haven't talked about them for some years, but they are required) measure that is multi record that allows users doing quality assurance to assert what constitutes quality in a multi record. Users are also free to assert their own tests, and we must provide a sound model for them to inform users of the meaning of test results. The thing this description goes to has single record or multi record as one of its three non-optional components, we can't leave that out of the description.

chicoreus avatar Mar 21 '22 21:03 chicoreus

@chicoreus I respectfully disagree that this is a requirement, and for reasons expressed here, believe it makes the tests less broadly usable, for example when we move into worlds (e.g., RDF) where "record" is impossible to define a priori. I would rather let user assert single record tests and multi record tests from tests that are more fundamental than to be painted into a corner where all this hard work is not as applicable as it could be.

tucotuco avatar Mar 21 '22 22:03 tucotuco

@debpaul

If the tests are run in a pre-amendment phase, amendment phase, and post-amendment phase (which is one way of composing them), where all the measures and validations are run on the data as presented, then the amendments are run, then all the measures and validations are run again on the data with the proposed amendments applied, in this case, VALIDATION_TAXONRANK_NOTSTANDARD and AMENDMENT_TAXONRANK_STANDARDIZED are paired, so if the data as presented for a single record contains a non-standard, but correctable to the controlled vocabulary, value, then the first run of VALIDATION_TAXONRANK_NOTSTANDARD will return a Response.result of NOT_COMPLIANT, AMENDMENT_TAXONRANK_STANDARDIZED would propose the correction to a value in the controlled vocabulary, and the post-amendment phase run of VALIDATION_TAXONRANK_NOTSTANDARD would return a Response.result of COMPLIANT, and if the tests are being run by a data curator doing quality control, that data curator could change their data (or how they are mapping their data to Darwin Core - that's something that does need to go into explanatory text, as the tests will pick up mapping problems and assert the results as pertaining to the single records they are defined for (we didn't define a test to validate all the unique values of dwc:taxonRank in a multi record and on seeing a small set of values all of which could be mapped onto the expected vocabulary, but aren't in it) assert either wholesale changes to the data or a change of the mappings of that data onto Darwin Core, that is supported by the framework, but not a form of test we saw as fitting into the core needs defined by TG3)).

For any test, EXTERNAL_PREREQUISITES_NOT_MET, means "try again later, internet connectivity, or a remote service was down, and if you run this test again when the external service is available, you will get a different result".

For any test, INTERNAL_PREREQUISITES_NOT_MET, means running the test again on the same data without changing it will result in the same inability to run the test. But, for some Validations (not VALIDATION_TAXONRANK_NOTSTANDARD, as filling in the component parts/atomic terms that are assembled into dwc:scientificName wasn't seen as a CORE need), there are amendments that may fill in values such that running the test in a pre-amendment phase, running the amendments, and then running the validation in a post-amendment test, such that a data curator will be presented with a proposed amendment that they could accept as a change to their data.

In a quality control setting, a consumer of data is likely to want to filter a multi record to a set for which all the validation response.status values are RUN_HAS_RESULT and all the validation response.result values are COMPLIANT, for CORE uses, this is all the validations in the CORE set, for other uses it might be a different set. Such a user might wish to include amendments, or not, and might just run a single validation phase, or an amendment phase followed by a post-amendment phase followed by filtering to data with quality for their needs.

Much depends on the setting and how the tests are composed. The tests have been deliberately defined as independently as possible so that they can be composed in different ways for different settings.

Yes, presentation to consumers of data quality reports is very important.

chicoreus avatar Mar 21 '22 23:03 chicoreus

Should this now be "VALIDATION_TAXONRANK_STANDARD" or "VALIDATION_TAXONRANK_ISSTANDARD"?

I think the first is better.

Tasilee avatar Mar 22 '22 00:03 Tasilee

Agree to former

ArthurChapman avatar Mar 22 '22 00:03 ArthurChapman

Likewise. STANDARD.

Somewhere in the chain of emails I'd sent out a set of suggested changes.

chicoreus avatar Mar 22 '22 00:03 chicoreus

Not seeing it, but

  • NOTSTANDARD to STANDARD
  • NOTFOUND to FOUND
  • EMPTY to NOTEMPTY
  • OUTOFRANGE to INRANGE
  • INCONSISTENT to CONSISTENT

should cover most of them.

chicoreus avatar Mar 22 '22 00:03 chicoreus

more...

AMBIGUOUS to UNAMBIGUOUS GREATERTHAN to LESSTHAN ZEO to NOTZERO TERRESTRIALMARINE same INCOMPLETE to COMPLETE

Tasilee avatar Mar 22 '22 01:03 Tasilee

Added to Notes: "This test will fail if there are leading or trailing white space or non-printing characters."

Tasilee avatar Sep 12 '22 02:09 Tasilee

Updated "Source Authority" and "References" in accord with @chicoreus comment on #163. @Tasilee to check

ArthurChapman avatar Feb 26 '23 23:02 ArthurChapman

Thanks @ArthurChapman. Checked.

Tasilee avatar Feb 27 '23 00:02 Tasilee

Post Zoom 11/7/2023, I have aligned the Source Authority with the suggested syntax:

bdq:sourceAuthority default = "GBIF Vocabulary: Taxonomic Rank" [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]

to

bdq:sourceAuthority default = "Darwin Core" {https://dwc.tdwg.org} {dwc:taxonRank [https://dwc.tdwg.org/list/#dwc_taxonRank]} {GBIF Vocabulary: Taxonomic Rank [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]}

Tasilee avatar Jul 11 '23 01:07 Tasilee