bdq
bdq copied to clipboard
TG2-VALIDATION_TAXONRANK_STANDARD
TestField | Value |
---|---|
GUID | 7bdb13a4-8a51-4ee5-be7f-20693fdb183e |
Label | VALIDATION_TAXONRANK_STANDARD |
Description | Does the value of dwc:taxonRank occur in the bdq:sourceAuthority? |
TestType | Validation |
Darwin Core Class | dwc:Taxon |
Information Elements ActedUpon | dwc:taxonRank |
Information Elements Consulted | |
Expected Response | EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonRank is bdq:Empty; COMPLIANT if the value of dwc:taxonRank is in the bdq:sourceAuthority; otherwise NOT_COMPLIANT. |
Data Quality Dimension | Conformance |
Term-Actions | TAXONRANK_STANDARD |
Parameter(s) | bdq:sourceAuthority |
Source Authority | bdq:sourceAuthority default = "GBIF TaxonRank Vocabulary" [https://api.gbif.org/v1/vocabularies/TaxonRank]} {"dwc:taxonRank vocabulary API" [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]} |
Specification Last Updated | 2023-09-18 |
Examples | [dwc:taxonRank="kingdom": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:taxonRank has an equivalent in the bdq:sourceAuthority"] |
[dwc:taxonRank="sp.": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:taxonRank does not have an equivalent in the bdq:sourceAuthority"] | |
Source | TDWG2018 |
References |
|
Example Implementations (Mechanisms) | Kurator/FilteredPush sci_name_qc Library |
Link to Specification Source Code | https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L2165 |
Notes | This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters. |
Added guid.
Discussion in call: Parameters: bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml), when a default is not yet available at a location could be either:
Parameters = bdq:sourceAuthority
or
Parameters = bdq:sourceAuthority (default=[GBIF rank vocabulary])
In the first case discussion of there being a vocabulary to use, but not available at a stable IRI at this point would go into notes. In the second case this is implicit in the square brackets.
And, when the vocabulary is at a stable location (e.g. at a DOI), use:
Parameter = bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml)
Discussion, suggestion from @tucotuco , use Parameter = bdq:sourceAuthority, move defaults into notes (non-normative, for us to keep them in one place to work on them), add example table a suite of default parameters:
VALIDATION_TAXON_RANK_NOTSTANDARD, bdq:sourceAuthority default = http://rs.gbif.org/vocabulary/gbif/rank.xml.
Test definitions remain simple, normative, encapsulated.
Different implementors can easily use a set of default parameters as a separate document.
Discussion, suggestion from @tucotuco , use Parameter = bdq:sourceAuthority, move defaults into notes (non-normative, for us to keep them in one place to work on them), add example table a suite of default parameters:
VALIDATION_TAXON_RANK_NOTSTANDARD, bdq:sourceAuthority default = http://rs.gbif.org/vocabulary/gbif/rank.xml.
Test definitions remain simple, normative, encapsulated.
Different implementors can easily use a set of default parameters as a separate document.
@tucotuco suggests: TG2 Parameter Default Value Recommendation document, distinct from the tests.
Suggested name of the separate document is "Test and Assertion Parameters"
Or "Test Parameters"
I think we need affiliation as in "TDWG DQ Test Parameters". I don't think it applies to the assertions as such?
In doing the test data set for this one - I am not sure it would not work through synonymy. Looking at it now ssp., subsp. etc. would be NOT_COMPLIANT as it does not seem to be one of the accepted terms in https://rs.gbif.org/vocabulary/gbif/rank.xml - there are several accepted terms in different languages subspecies, subespecie, etc.
The abbreviations would be rightfully NOT_COMPLIANT.
That may be something that GBIF fixes later to include synonymy - which would then be fixed under an AMENDMENT
That type of issue aligns with my conclusion about assumed, increasingly 'smarter' bdq:sourceAuthorities in handling variants - and this is what I remember @tucotuco suggesting regarding vocabs on values on one of our issues.
This is my second definition - looking at a validation
| Description | Validation of the value in Taxon Rank for conformity with a value obtained from a Paramaterized Source Authority. If no parameter is set, the source authority defaults to the latest Taxonomic Rank GBIF Vocabulary.
I'd suggest shorter:
| Definition | Validation that a provided value of Taxon Rank in a single record conforms with a specified controlled vocabulary |
I would also try to stick with “bdq:sourceAuthority” everywhere. In this case a) it is Parameterized and b) is a vocabulary
Following comments in #163 and #112, I suggest:
| Description | A test that checks if the value of dwc:taxonRank unambiguously conforms to the corresponding value provided from a specified bdq:sourceAuthority. I
I agree with @tucotuco that for these tests we do not need to add single record - that can be covered in the introductory test as all the validations and amendments cover only single records. If there are any that don't then that is the time to mention "multiple records"
Curious if you plan to provide any concrete examples? (in Example Mechanisms?). I'm thinking like:
- As a curator, I sent this taxonomic name with dwc:taxonRank == [given rank empty]
- I get error of INTERNAL_PREREQUISITES_NOT_MET since dwc:taxonRank is EMPTY;
- Now what do I do?
OR
- In the dwc:taxonRank I used a "non-standard" value like {sp., spp., subsp., fam, etc}.
- The test returns
- EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available;
- otherwise NOT_COMPLIANT.
- Now what do I do?
OR
- I provide a valid value for dwc:taxonRank and get
- COMPLIANT if the value of dwc:taxonRank is in the bdq:sourceAuthority;
- in which case as a data provider I get a gold star?
@ArthurChapman @tucotuco I will reiterate, single record is an essential part of these descriptions. We cannot omit it. We cannot include it in introductory text. Remember that for every validation that is single record, we must also generate a (trivially generated, which is why we haven't talked about them for some years, but they are required) measure that is multi record that allows users doing quality assurance to assert what constitutes quality in a multi record. Users are also free to assert their own tests, and we must provide a sound model for them to inform users of the meaning of test results. The thing this description goes to has single record or multi record as one of its three non-optional components, we can't leave that out of the description.
@chicoreus I respectfully disagree that this is a requirement, and for reasons expressed here, believe it makes the tests less broadly usable, for example when we move into worlds (e.g., RDF) where "record" is impossible to define a priori. I would rather let user assert single record tests and multi record tests from tests that are more fundamental than to be painted into a corner where all this hard work is not as applicable as it could be.
@debpaul
If the tests are run in a pre-amendment phase, amendment phase, and post-amendment phase (which is one way of composing them), where all the measures and validations are run on the data as presented, then the amendments are run, then all the measures and validations are run again on the data with the proposed amendments applied, in this case, VALIDATION_TAXONRANK_NOTSTANDARD and AMENDMENT_TAXONRANK_STANDARDIZED are paired, so if the data as presented for a single record contains a non-standard, but correctable to the controlled vocabulary, value, then the first run of VALIDATION_TAXONRANK_NOTSTANDARD will return a Response.result of NOT_COMPLIANT, AMENDMENT_TAXONRANK_STANDARDIZED would propose the correction to a value in the controlled vocabulary, and the post-amendment phase run of VALIDATION_TAXONRANK_NOTSTANDARD would return a Response.result of COMPLIANT, and if the tests are being run by a data curator doing quality control, that data curator could change their data (or how they are mapping their data to Darwin Core - that's something that does need to go into explanatory text, as the tests will pick up mapping problems and assert the results as pertaining to the single records they are defined for (we didn't define a test to validate all the unique values of dwc:taxonRank in a multi record and on seeing a small set of values all of which could be mapped onto the expected vocabulary, but aren't in it) assert either wholesale changes to the data or a change of the mappings of that data onto Darwin Core, that is supported by the framework, but not a form of test we saw as fitting into the core needs defined by TG3)).
For any test, EXTERNAL_PREREQUISITES_NOT_MET, means "try again later, internet connectivity, or a remote service was down, and if you run this test again when the external service is available, you will get a different result".
For any test, INTERNAL_PREREQUISITES_NOT_MET, means running the test again on the same data without changing it will result in the same inability to run the test. But, for some Validations (not VALIDATION_TAXONRANK_NOTSTANDARD, as filling in the component parts/atomic terms that are assembled into dwc:scientificName wasn't seen as a CORE need), there are amendments that may fill in values such that running the test in a pre-amendment phase, running the amendments, and then running the validation in a post-amendment test, such that a data curator will be presented with a proposed amendment that they could accept as a change to their data.
In a quality control setting, a consumer of data is likely to want to filter a multi record to a set for which all the validation response.status values are RUN_HAS_RESULT and all the validation response.result values are COMPLIANT, for CORE uses, this is all the validations in the CORE set, for other uses it might be a different set. Such a user might wish to include amendments, or not, and might just run a single validation phase, or an amendment phase followed by a post-amendment phase followed by filtering to data with quality for their needs.
Much depends on the setting and how the tests are composed. The tests have been deliberately defined as independently as possible so that they can be composed in different ways for different settings.
Yes, presentation to consumers of data quality reports is very important.
Should this now be "VALIDATION_TAXONRANK_STANDARD" or "VALIDATION_TAXONRANK_ISSTANDARD"?
I think the first is better.
Agree to former
Likewise. STANDARD.
Somewhere in the chain of emails I'd sent out a set of suggested changes.
Not seeing it, but
- NOTSTANDARD to STANDARD
- NOTFOUND to FOUND
- EMPTY to NOTEMPTY
- OUTOFRANGE to INRANGE
- INCONSISTENT to CONSISTENT
should cover most of them.
more...
AMBIGUOUS to UNAMBIGUOUS GREATERTHAN to LESSTHAN ZEO to NOTZERO TERRESTRIALMARINE same INCOMPLETE to COMPLETE
Added to Notes: "This test will fail if there are leading or trailing white space or non-printing characters."
Updated "Source Authority" and "References" in accord with @chicoreus comment on #163. @Tasilee to check
Thanks @ArthurChapman. Checked.
Post Zoom 11/7/2023, I have aligned the Source Authority with the suggested syntax:
bdq:sourceAuthority default = "GBIF Vocabulary: Taxonomic Rank" [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]
to
bdq:sourceAuthority default = "Darwin Core" {https://dwc.tdwg.org} {dwc:taxonRank [https://dwc.tdwg.org/list/#dwc_taxonRank]} {GBIF Vocabulary: Taxonomic Rank [https://api.gbif.org/v1/vocabularies/TaxonRank/concepts]}