bdq icon indicating copy to clipboard operation
bdq copied to clipboard

TG2-AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID

Open iDigBioBot opened this issue 7 years ago • 56 comments

TestField Value
GUID f01fb3f9-2f7e-418b-9f51-adf50f202aea
Label AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID
Description Proposes an amendment to the value of dwc:scientificName using the dwc:scientificNameID value from the bdq:sourceAuthority.
TestType Amendment
Darwin Core Class dwc:Taxon
Information Elements ActedUpon dwc:scientificName
Information Elements Consulted dwc:scientificNameID
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificNameID is bdq:Empty, or dwc:scientificName is bdq:NotEmpty; FILLED_IN the value of dwc:scientificName if the value of dwc: scientificNameID could be unambiguously interpreted as a value in the bdq:sourceAuthority; otherwise NOT_AMENDED
Data Quality Dimension Completeness
Term-Actions SCIENTIFICNAME_FROM_SCIENTIFICNAMEID
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]}
Specification Last Updated 2024-08-18
Examples [dwc:scientificNameID="gbif:8102122", dwc:scientificName="": Response.status=FILLED_IN, Response.result=dwc:scientificName="Harpullia pendula F.Muell.", Response.comment="dwc:scientificNameID contains an interpretable value"]
[dwc:scientificNameID="gbif:8a", dwc:scientificName="": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:scientificNameID does not contain an interpretable value"]
Source iDigBio
References
  • GBIF Secretariat (2023) GBIF Backbone Taxonomy. Checklist dataset. https://doi.org/10.15468/39omei
Example Implementations (Mechanisms) Kurator/FilteredPush sci_name_qc Library
Link to Specification Source Code https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L1156
Notes The value of dwc:scientificNameID is unambiguous if dwc:scientificNameID references a single taxon record in the bdq:sourceAuthority. When referencing a GBIF taxon by GBIF's identifier for that taxon, use the the pseudo-namespace "gbif:" and the form "gbif:{integer}" as the value for dwc:scientificNameID. Implementors can be aware of the current GBIF api endpoint that can replace the pseduo-namespace gbif: when looking up the dwc:scientificNameID (taxonID in the gbif document), e.g. s/gbif:/https:\/\/api.gbif.org\/v1\/species\// will transform the value taxonID=gbif:8102122 to the resolvable endpoint https://api.gbif.org/v1/species/8102122 The pseudo-namespace "gbif:" is recommended by GBIF to reference GBIF taxon records. Where resolvable persistent identifiers exist for dwc:scientificNameID values, they should be used in full, but implementors will need to support at least the "gbif:" pseudo-namespace.

iDigBioBot avatar Jan 05 '18 15:01 iDigBioBot

Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: It would seem that a scientificName consistency test is needed: scientificName is consistent with what's provided in genus, specificEpithet, etc. Added a test at the bottom. Also, I believe the converse tests should be included: genus, specificEpithet, infraspecificEp, sciNameAut completed from sciName. "GENUS_FROM_SCI_NAME" and the like

iDigBioBot avatar Jan 05 '18 15:01 iDigBioBot

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: This can't be implemented until dwc:genericEpithet is approved. dwc:genus is NOT the atomic parse of genus from scientific name, it is genus into which the occurrence is classified, for types the two of these can differ.

iDigBioBot avatar Jan 12 '18 16:01 iDigBioBot

Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: I don't understand @PJM - a Genus CAN be parsed from a binomial by definition - at least in the Botanical Code. The Zoological Code doesn't inlcude the concept of a 'Specific Epithet' whereas the Botanical Code does (I am not up to date on Zoological Code but there was some discussion on adopting the concept from the Botanical Code) but as I understand both codes - "GENUS" can be standalone and does not need a separate GENUS Epithet concept.

iDigBioBot avatar Jan 12 '18 16:01 iDigBioBot

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: but we can't implement this until dwc:genericEpithet is approved.

iDigBioBot avatar Jan 12 '18 16:01 iDigBioBot

Phrasing of "scientificName was added", needs clearer specification, "added" creates ambiguity about intention, unclear if implementors should only fill in empty scientificName, or if existing values should be changed. Specification needs to be clearer.

chicoreus avatar Aug 31 '19 23:08 chicoreus

I've commented on the issues noted in @chicoreus email of September 1. Does that email raises a new (GitHub) issue as it would be good to document more consistently?

Tasilee avatar Sep 01 '19 23:09 Tasilee

From @chicoreus : #71 ... AMENDED if dwc:scientificName was EMPTY and a value was added from a lookup of the dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED

ArthurChapman avatar Apr 07 '20 21:04 ArthurChapman

Suggestion: We usually add the prerequesites in theINTERNAL_PREREQUISITES_NOT_MET rather than in the AMENDED part, so I suggest moving the dwc:scientificName was NOT_EMPTY Thus:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is EMPTY, the value of dwc:taxonID is ambiguous or the dwc:scientificName was NOT_EMPTY; AMENDED if value was added from a lookup of the dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED

ArthurChapman avatar Apr 08 '20 03:04 ArthurChapman

Thanks @ArthurChapman - I agree that where possible, we include such tests in the INTERNALs. That reads well to me. Editing.

Tasilee avatar Apr 08 '20 21:04 Tasilee

I have changed Expected response to "EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY, the value of dwc:taxonID is ambiguous or dwc:scientificName was not EMPTY; AMENDED dwc:scientificName from a successful lookup of dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED"

As noted elsewhere, we need to decide where "the value of dwc:...." as against "dwc:...".

the value of dwc:taxonID is ambiguous vs dwc:scientificName was not EMPTY

Also noted another reversion to NOT_EMPTY!

Tasilee avatar Apr 08 '20 21:04 Tasilee

Note @Tasilee that in the TG2 Vocabulary (#152) we have the term NOTEMPTY (A field that is present and has content.) Do we need to change the term in #152?

ArthurChapman avatar Apr 08 '20 22:04 ArthurChapman

@Tasilee "the value of dwc:taxonID is ambiguous or dwc:scientificName was not EMPTY;" probably is a good example, value of dwc:x is ambiguous, talking explicitly about the value, and dwc:x is empty indicating that the term is empty, one option within that scope being that the value is an empty string.

@ArthurChapman , If we need both EMPTY and NOT_EMPTY, then we should probably define NOT_EMPTY as simply the logical inverse of EMPTY, if we don't need it, then we could reference "not EMPTY" in the specifications.

chicoreus avatar Apr 08 '20 22:04 chicoreus

BTW, we have three tests with labels

TG2-NOTIFICATION_ANNOTATION_NOTEMPTY TG2-NOTIFICATION_DATAGENERALIZATIONS_NOTEMPTY TG2-NOTIFICATION_ESTABLISHMENTMEANS_NOTEMPTY

Currently all references in Expected responses are now "not EMPTY" so I would concur with @chicoreus

Tasilee avatar Apr 08 '20 22:04 Tasilee

I think it is in #152 because of the three test names. We can leave it there as that definition applies to those three. But in the tests use not EMPTY.

ArthurChapman avatar Apr 08 '20 22:04 ArthurChapman

I just checked the example and it needed to be amended to "https://api.gbif.org/v1/species/8102122" (Note "/v1"). I suspect a few more of these may be in github. I will see what I can find. This issue is it as far as I can tell.

Tasilee avatar Oct 18 '21 01:10 Tasilee

The Expected Response here contains "INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY" but the paired VALIDATION https://github.com/tdwg/bdq/issues/120 will have already checked this, so is this this component of the Expected Response redundant? There are similar situations with many of the AMENDMENTs. In other words, would we even run the AMENDMENT or just report automatically?

Tasilee avatar Oct 25 '21 00:10 Tasilee

@Tasilee All of the tests have to be defined as if run in isolation. A linear workflow with a validation before a specific amendment is only one possible alternative. Thus the prerequisites for an amendment would be expected overlap with validations.

chicoreus avatar Oct 25 '21 00:10 chicoreus

We did discuss a workflow but then it was sort of agreed that each test needed to be run in isolation as stated by @chicoreus. I don't think we discussed it fully (need another face to face) but from memory, it was thought different institutions may run the tests differently, or only run some tests and not others and thus they needed to be standalone.

ArthurChapman avatar Oct 25 '21 00:10 ArthurChapman

Thanks @ArthurChapman. I also vaguely remember such a discussion about each being somewhat independent (but AMENDMENTs are - the way we designed them) dependent on their equivalent VALIDATIONs. When it comes to generating the test data, the chooks come home to roost.

Tasilee avatar Oct 25 '21 01:10 Tasilee

@Taslee the amendments relate to, but are not dependent on validations. One expected workflow is to run all validations in parallel, then run all amendments in parallel, then run all validations again with all amendments accepted to measure how much a data set might have its fittness increased by accepting annotations. Another plausible workflow is to run all amendments followed by all validations, accepting amended data that has passed all the tests. A core requirement is that each test be able to stand on its own.

chicoreus avatar Oct 25 '21 01:10 chicoreus

Changed "AMENDED" to "FILLED_IN" in accordance with discussions April 16.

Tasilee avatar Apr 18 '22 22:04 Tasilee

Amended Example to align with @chicoreus email comments 17th June 2022.

Tasilee avatar Jun 19 '22 23:06 Tasilee

Email discussion on the Expected Response as per similar issue with #56. In this case, it is the repeat of the "ambiguity" of dwc:taxonID that worries me.

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY, the value of dwc:taxonID is ambiguous or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED

My point is that ambiguity in dwc:taxonID will result in INTERNAL_PREREQUISITES_NOT_MET, so the second check "dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority" will never be activated.

So in this case, I'd suggest we remove the first occurrence to have

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED

True?

Tasilee avatar Feb 26 '23 22:02 Tasilee

I agree with this - but in line with #56 should we add invalid into the INTERNAL_PREREQUISITES_NOT_MET or does this just complicate the issue?

e.g. EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY or invalid or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED

I think I would be happy either way on this - but for consistency?

ArthurChapman avatar Feb 26 '23 23:02 ArthurChapman

I'm happy enough with adding the "invalid" as I presume from @chicoreus, we can detect the 'invalidity' and that is totally different from the 'ambiguity' aspect? @tucotuco and @chicoreus ? How say you?

It is looking like we need to document a clear and concise rule for this type of issue.

Tasilee avatar Feb 27 '23 00:02 Tasilee

On Sun, 26 Feb 2023 15:16:22 -0800 Arthur Chapman @.***> wrote:

I agree with this - but in line with #56 should we add invalid into the INTERNAL_PREREQUISITES_NOT_MET or does this just complicate the issue?

e.g. EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY or invalid or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED

I think I would be happy either way on this - but for consistency?

For this one, I think not, as there isn't any easy test for invalidity of a taxonId value (unlike dates, geodetic datum values, etc.).

chicoreus avatar Feb 27 '23 00:02 chicoreus

On Sun, 26 Feb 2023 16:20:02 -0800 Lee Belbin @.***> wrote:

I'm happy enough with adding the "invalid" as I presume from @chicoreus, we can detect the 'invalidity' and that is totally different from the 'ambiguity' aspect? @tucotuco and @chicoreus ? How say you?

Avoiding "invalid", by asserting known to the source authority, how about:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY, or is not found in the bdq:sourceAuthority, or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:taxonID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED

It is looking like we need to document a clear and concise rule for this type of issue.

Something on the line of: in general when presented with data from which no assertion can be made due to empty or invalid values, the specification for amendments assert that prerequisites are not met, rather than asserting not amended.

chicoreus avatar Feb 27 '23 01:02 chicoreus

Thanks @chicoreus. Your ER is more explicit, which is good.

We have one VALIDATION relating to the content of dwc:taxonID: #121, which seeks to detect a valid 'value'. Anything that doesn't pass muster results in "NOT_COMPLIANT".

So, the scenario for a NOT_COMPLIANT value from #121 would result here in an INTERNAL_PREREQUISITES_NOT_MET by your ER, not a NOT_AMENDED. Is this appropriate? It is rather subtle for me.

Tasilee avatar Feb 27 '23 01:02 Tasilee

To summarise the issue, we have Expected Responses that take the form of

.....; INTERNAL_PREREQUISITES_NOT_MET if input is EMPTY or INVALID; COMPLIANT/AMENDED if input is 'OK'; otherwise ...

What had been bugging me on some tests were variants of

.....; INTERNAL_PREREQUISITES_NOT_MET if input is EMPTY or INVALID; COMPLIANT/AMENDED if input is VALID and 'OK'; otherwise ...

Talking with @ArthurChapman, we can see the utility of short-circuiting the 'test' for input invalidity. This gets back to the definition that Paul has suggested above that will need to go in a Preamble. Could I tweak that to cover VALIDATIONs as well to something like

INTERNAL_PREREQUISITES_NOT_MET: When a test is presented with data from which no assertion can be made due to empty or invalid values, the specification for validations or amendments assert that the prerequisites are not met, rather than asserting not compliant or not amended

?

Tasilee avatar Mar 02 '23 22:03 Tasilee

Can we seeks agreement and direction on the previous comment please? We may need to edit the vocabulary entry for INTERNAL_PREREQUISITES_NOT_MET

Tasilee avatar Mar 11 '23 01:03 Tasilee