bdq
bdq copied to clipboard
TG2-AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID
TestField | Value |
---|---|
GUID | f01fb3f9-2f7e-418b-9f51-adf50f202aea |
Label | AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID |
Description | Proposes an amendment to the value of dwc:scientificName using the dwc:scientificNameID value from the bdq:sourceAuthority. |
TestType | Amendment |
Darwin Core Class | dwc:Taxon |
Information Elements ActedUpon | dwc:scientificName |
Information Elements Consulted | dwc:scientificNameID |
Expected Response | EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificNameID is bdq:Empty, or dwc:scientificName is bdq:NotEmpty; FILLED_IN the value of dwc:scientificName if the value of dwc: scientificNameID could be unambiguously interpreted as a value in the bdq:sourceAuthority; otherwise NOT_AMENDED |
Data Quality Dimension | Completeness |
Term-Actions | SCIENTIFICNAME_FROM_SCIENTIFICNAMEID |
Parameter(s) | bdq:sourceAuthority |
Source Authority | bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]} |
Specification Last Updated | 2024-08-18 |
Examples | [dwc:scientificNameID="gbif:8102122", dwc:scientificName="": Response.status=FILLED_IN, Response.result=dwc:scientificName="Harpullia pendula F.Muell.", Response.comment="dwc:scientificNameID contains an interpretable value"] |
[dwc:scientificNameID="gbif:8a", dwc:scientificName="": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:scientificNameID does not contain an interpretable value"] | |
Source | iDigBio |
References |
|
Example Implementations (Mechanisms) | Kurator/FilteredPush sci_name_qc Library |
Link to Specification Source Code | https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L1156 |
Notes | The value of dwc:scientificNameID is unambiguous if dwc:scientificNameID references a single taxon record in the bdq:sourceAuthority. When referencing a GBIF taxon by GBIF's identifier for that taxon, use the the pseudo-namespace "gbif:" and the form "gbif:{integer}" as the value for dwc:scientificNameID. Implementors can be aware of the current GBIF api endpoint that can replace the pseduo-namespace gbif: when looking up the dwc:scientificNameID (taxonID in the gbif document), e.g. s/gbif:/https:\/\/api.gbif.org\/v1\/species\// will transform the value taxonID=gbif:8102122 to the resolvable endpoint https://api.gbif.org/v1/species/8102122 The pseudo-namespace "gbif:" is recommended by GBIF to reference GBIF taxon records. Where resolvable persistent identifiers exist for dwc:scientificNameID values, they should be used in full, but implementors will need to support at least the "gbif:" pseudo-namespace. |
Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: It would seem that a scientificName consistency test is needed: scientificName is consistent with what's provided in genus, specificEpithet, etc. Added a test at the bottom. Also, I believe the converse tests should be included: genus, specificEpithet, infraspecificEp, sciNameAut completed from sciName. "GENUS_FROM_SCI_NAME" and the like
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: This can't be implemented until dwc:genericEpithet is approved. dwc:genus is NOT the atomic parse of genus from scientific name, it is genus into which the occurrence is classified, for types the two of these can differ.
Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: I don't understand @PJM - a Genus CAN be parsed from a binomial by definition - at least in the Botanical Code. The Zoological Code doesn't inlcude the concept of a 'Specific Epithet' whereas the Botanical Code does (I am not up to date on Zoological Code but there was some discussion on adopting the concept from the Botanical Code) but as I understand both codes - "GENUS" can be standalone and does not need a separate GENUS Epithet concept.
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: but we can't implement this until dwc:genericEpithet is approved.
Phrasing of "scientificName was added", needs clearer specification, "added" creates ambiguity about intention, unclear if implementors should only fill in empty scientificName, or if existing values should be changed. Specification needs to be clearer.
I've commented on the issues noted in @chicoreus email of September 1. Does that email raises a new (GitHub) issue as it would be good to document more consistently?
From @chicoreus : #71 ... AMENDED if dwc:scientificName was EMPTY and a value was added from a lookup of the dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED
Suggestion: We usually add the prerequesites in theINTERNAL_PREREQUISITES_NOT_MET rather than in the AMENDED part, so I suggest moving the dwc:scientificName was NOT_EMPTY Thus:
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is EMPTY, the value of dwc:taxonID is ambiguous or the dwc:scientificName was NOT_EMPTY; AMENDED if value was added from a lookup of the dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED
Thanks @ArthurChapman - I agree that where possible, we include such tests in the INTERNALs. That reads well to me. Editing.
I have changed Expected response to "EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY, the value of dwc:taxonID is ambiguous or dwc:scientificName was not EMPTY; AMENDED dwc:scientificName from a successful lookup of dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED"
As noted elsewhere, we need to decide where "the value of dwc:...." as against "dwc:...".
the value of dwc:taxonID is ambiguous vs dwc:scientificName was not EMPTY
Also noted another reversion to NOT_EMPTY!
Note @Tasilee that in the TG2 Vocabulary (#152) we have the term NOTEMPTY (A field that is present and has content.) Do we need to change the term in #152?
@Tasilee "the value of dwc:taxonID is ambiguous or dwc:scientificName was not EMPTY;" probably is a good example, value of dwc:x is ambiguous, talking explicitly about the value, and dwc:x is empty indicating that the term is empty, one option within that scope being that the value is an empty string.
@ArthurChapman , If we need both EMPTY and NOT_EMPTY, then we should probably define NOT_EMPTY as simply the logical inverse of EMPTY, if we don't need it, then we could reference "not EMPTY" in the specifications.
BTW, we have three tests with labels
TG2-NOTIFICATION_ANNOTATION_NOTEMPTY TG2-NOTIFICATION_DATAGENERALIZATIONS_NOTEMPTY TG2-NOTIFICATION_ESTABLISHMENTMEANS_NOTEMPTY
Currently all references in Expected responses are now "not EMPTY" so I would concur with @chicoreus
I think it is in #152 because of the three test names. We can leave it there as that definition applies to those three. But in the tests use not EMPTY.
I just checked the example and it needed to be amended to "https://api.gbif.org/v1/species/8102122" (Note "/v1"). I suspect a few more of these may be in github. I will see what I can find. This issue is it as far as I can tell.
The Expected Response here contains "INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY" but the paired VALIDATION https://github.com/tdwg/bdq/issues/120 will have already checked this, so is this this component of the Expected Response redundant? There are similar situations with many of the AMENDMENTs. In other words, would we even run the AMENDMENT or just report automatically?
@Tasilee All of the tests have to be defined as if run in isolation. A linear workflow with a validation before a specific amendment is only one possible alternative. Thus the prerequisites for an amendment would be expected overlap with validations.
We did discuss a workflow but then it was sort of agreed that each test needed to be run in isolation as stated by @chicoreus. I don't think we discussed it fully (need another face to face) but from memory, it was thought different institutions may run the tests differently, or only run some tests and not others and thus they needed to be standalone.
Thanks @ArthurChapman. I also vaguely remember such a discussion about each being somewhat independent (but AMENDMENTs are - the way we designed them) dependent on their equivalent VALIDATIONs. When it comes to generating the test data, the chooks come home to roost.
@Taslee the amendments relate to, but are not dependent on validations. One expected workflow is to run all validations in parallel, then run all amendments in parallel, then run all validations again with all amendments accepted to measure how much a data set might have its fittness increased by accepting annotations. Another plausible workflow is to run all amendments followed by all validations, accepting amended data that has passed all the tests. A core requirement is that each test be able to stand on its own.
Changed "AMENDED" to "FILLED_IN" in accordance with discussions April 16.
Amended Example to align with @chicoreus email comments 17th June 2022.
Email discussion on the Expected Response as per similar issue with #56. In this case, it is the repeat of the "ambiguity" of dwc:taxonID that worries me.
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY, the value of dwc:taxonID is ambiguous or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED
My point is that ambiguity in dwc:taxonID will result in INTERNAL_PREREQUISITES_NOT_MET, so the second check "dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority" will never be activated.
So in this case, I'd suggest we remove the first occurrence to have
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED
True?
I agree with this - but in line with #56 should we add invalid into the INTERNAL_PREREQUISITES_NOT_MET or does this just complicate the issue?
e.g. EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY or invalid or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED
I think I would be happy either way on this - but for consistency?
I'm happy enough with adding the "invalid" as I presume from @chicoreus, we can detect the 'invalidity' and that is totally different from the 'ambiguity' aspect? @tucotuco and @chicoreus ? How say you?
It is looking like we need to document a clear and concise rule for this type of issue.
On Sun, 26 Feb 2023 15:16:22 -0800 Arthur Chapman @.***> wrote:
I agree with this - but in line with #56 should we add invalid into the INTERNAL_PREREQUISITES_NOT_MET or does this just complicate the issue?
e.g. EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY or invalid or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED
I think I would be happy either way on this - but for consistency?
For this one, I think not, as there isn't any easy test for invalidity of a taxonId value (unlike dates, geodetic datum values, etc.).
On Sun, 26 Feb 2023 16:20:02 -0800 Lee Belbin @.***> wrote:
I'm happy enough with adding the "invalid" as I presume from @chicoreus, we can detect the 'invalidity' and that is totally different from the 'ambiguity' aspect? @tucotuco and @chicoreus ? How say you?
Avoiding "invalid", by asserting known to the source authority, how about:
EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY, or is not found in the bdq:sourceAuthority, or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED
It is looking like we need to document a clear and concise rule for this type of issue.
Something on the line of: in general when presented with data from which no assertion can be made due to empty or invalid values, the specification for amendments assert that prerequisites are not met, rather than asserting not amended.
Thanks @chicoreus. Your ER is more explicit, which is good.
We have one VALIDATION relating to the content of dwc:taxonID: #121, which seeks to detect a valid 'value'. Anything that doesn't pass muster results in "NOT_COMPLIANT".
So, the scenario for a NOT_COMPLIANT value from #121 would result here in an INTERNAL_PREREQUISITES_NOT_MET by your ER, not a NOT_AMENDED. Is this appropriate? It is rather subtle for me.
To summarise the issue, we have Expected Responses that take the form of
.....; INTERNAL_PREREQUISITES_NOT_MET if input is EMPTY or INVALID; COMPLIANT/AMENDED if input is 'OK'; otherwise ...
What had been bugging me on some tests were variants of
.....; INTERNAL_PREREQUISITES_NOT_MET if input is EMPTY or INVALID; COMPLIANT/AMENDED if input is VALID and 'OK'; otherwise ...
Talking with @ArthurChapman, we can see the utility of short-circuiting the 'test' for input invalidity. This gets back to the definition that Paul has suggested above that will need to go in a Preamble. Could I tweak that to cover VALIDATIONs as well to something like
INTERNAL_PREREQUISITES_NOT_MET: When a test is presented with data from which no assertion can be made due to empty or invalid values, the specification for validations or amendments assert that the prerequisites are not met, rather than asserting not compliant or not amended
?
Can we seeks agreement and direction on the previous comment please? We may need to edit the vocabulary entry for INTERNAL_PREREQUISITES_NOT_MET