bdq
bdq copied to clipboard
TG2-VALIDATION_POLYNOMIAL_CONSISTENT
TestField | Value |
---|---|
GUID | 17f03f1f-f74d-40c0-8071-2927cfc9487b |
Label | VALIDATION_POLYNOMIAL_CONSISTENT |
Description | Is the polynomial represented in dwc:scientificName consistent with the equivalent values in dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet? |
TestType | Validation |
Darwin Core Class | dwc:Taxon |
Information Elements ActedUpon | dwc:scientificName |
dwc:genericName | |
dwc:specificEpithet | |
dwc:infraspecificEpithet | |
Information Elements Consulted | |
Expected Response | INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is bdq:Empty, or all of dwc:genericName, dwc:specificEpithet and dwc:infraspecificEpithet are bdq:Empty; COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with bdq:NotEmpty values of dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet; otherwise NOT_COMPLIANT. |
Data Quality Dimension | Consistency |
Term-Actions | POLYNOMIAL_CONSISTENT |
Parameter(s) | |
Source Authority | |
Specification Last Updated | 2023-09-18 |
Examples | [dwc:scientificName="Hakea decurrens ssp. physocarpa", dwc:genericName="", dwc:specificEpithet="decurrens", dwc:infraspecificEpithet="physocarpa": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="Values of all non-empty atomic terms are found in the polynomial"] |
[dwc:scientificName="Hakea decurrens", dwc:genericName="Hakea", dwc:specificEpithet="decurrens", dwc:infraspecificEpithet="physocarpa": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:scientificName is inconsistent with atomic parts (dwc:genus, dwc:specificEpithet and dwc:infraspecificEpithet)"] | |
Source | Paula Zermoglio |
References |
|
Example Implementations (Mechanisms) | Kurator/FilteredPush sci_name_qc Library, FP-Akka |
Link to Specification Source Code | https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L1554 |
Notes | If dwc:specificEpithet is populated then this test expects that the value dwc:specificEpithet is the name of the second or species epithet of the scientificName. If dwc:genericName is populated, this test expects that the value of dwc:genus is the first word of the value of dwc:scientificName. If dwc:specificEpithet is populated then this test expects that the value dwc:specificEpithet is the name of the first or species epithet of the scientificName. If dwc:infraspecificEpithet is populated, then this test expects that the value of dwc:infraspecificEpithet is the name of the lowest or terminal infraspecific epithet of the scientificName, excluding any rank designation. |
See Positive description - do we need to add "scientificNameAuthorship" to fields?
Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: Variable name would need changing as this relates to the Positive side of the test rather than the negative. Also the Description appears for the (test - PASS) column (currently hidden)
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: @AC: Variable name is fine. The other validation variable names need to change. We must specify all of them as positive, not negative.
Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: Whatever we do we need to be consistent
We haven't addressed the point from @ArthurChapman that the authorship needs to be included, as scientificNameAuthorship may (incorrectly) differ from the authorship parsed out of scientificName.
This test shares a name with #45 and #46, but this test looks for consistency in the parts of the name in their various darwin core fields, while the other two tests currently only compare scientificName with a source authority.
This is another one that was originally called (pre-Gainesville) "TG2-VALIDATION_SCIENTIFICNAME_INCONSISTENT". I can't see my discussion on including Authorship @chicoreus - I believe Authorship may complicate things (as the many different spellings and inconsistencies) - I am thinking that is maybe why we changed the naming of these three to POLYNOMIAL from SCIENTIFICNAME - i.e. to basically exclude authorship in the Scientific Name.
I believe that is correct, we wanted to distinguish explicitly in the name of the test that the authorship was not included.
On Wed, Jun 24, 2020 at 10:15 PM Arthur Chapman [email protected] wrote:
This is another one that was originally called (pre-Gainesville) "TG2-VALIDATION_SCIENTIFICNAME_INCONSISTENT". I can't see my discussion on including Authorship @chicoreus https://github.com/chicoreus - I believe Authorship may complicate things (as the many different spellings and inconsistencies) - I am thinking that is maybe why we changed the naming of these three to POLYNOMIAL from SCIENTIFICNAME - i.e. to basically exclude authorship in the Scientific Name.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/101#issuecomment-649156138, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ725DWYXVTIYLNNZHDVDRYKQLVANCNFSM4EKSRXMA .
Trying to look at a test dataset for this test
At present we say "INTERNAL_PREREQUISITES_NOT_MET if all of the component terms are EMPTY"
but surely INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is EMPTY and/or if dwc:genus is EMPTY.
dwc:infraspecificEpithet or specificEpithet on their own are not sufficient to be able to compare Scientific Name against genus, species, infraspecies.
If you have a scientificName and a genus (but no specificEpithet or infraspecificEpithet) then you can still compare
As a followup from my last comment - you may like to look at the DRAFT test data file I have created on my interpretation
https://github.com/tdwg/bdq/blob/master/tg2/core/testdata/testdata_POLYNOMIAL_INCOSISTENT_%23101.csv
Looking at #82 SCIENTIFICNAME_EMPTY overlaps with this one. If one was using a Workflow and #82 was run first and failed, then#101 would not need to be run. We seem to have a little redundancy here, but not sure how to fix. I see no problem in having both.
@ArthurChapman see the description of the logic in the notes. dwc:genus can be empty and dwc:specificEpithet can still be checked against dwc:scientificName for consistency.
See note in #82 these tests are along different axies in the framework, and test order is not specified, so some overlap is expected, especially in complex sets of interrelated terms like these.
@chicoreus OK - but that still means that INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is EMPTY or if all of dwc:genus, dwc:specificEpithet or dwc:infraspecificEpithet are empty
OK - I didn't read the notes - I will change my tests data file to concur with the notes (once this discussion is finished) Interesting though, if we say that the test is COMPLIANT if you have a scientific name with Aus Bus Cus and the genus is empty and the species is empty but you have Cus in the infraspecific epithet. Would not logic say that it is NOT_COMPLIANT because the genus and species aren't compliant with the scientificName because they don't have values.
In the light of recent discussions, I have added the specific dwc terms to the Expected Response.
Examining test data, the following would return NOT_COMPLIANT when I think it should be INTERNAL_PREREQUISITES_NOT_MET
dwc:scientificName="", dwc:genus="Hakea", dwc:specificEpithet="decurrens", dwc:infraspecificEpithet="physocarpa"
??
Agreed.
OK, so could we have a taxon guru adapt the Expected response? These epithet things scare me. Names scare me.
Agreed - probably needs rewording
INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName, and all of dwc:genus, dwc:specificEpithet and dwc:infraspecificEpithet are EMPTY; COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with the atomic parts dwc:genus, dwc:specificEpithet, dwc:infraspecificEpithet; otherwise NOT_COMPLIANT
Hmm, maybe
INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is EMPTY, or all of dwc:genus, dwc:specificEpithet and dwc:infraspecificEpithet are EMPTY; COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with the atomic parts dwc:genus, dwc:specificEpithet, dwc:infraspecificEpithet; otherwise NOT_COMPLIANT
+1 to what @Tasilee said.
1 + @tucotuco is a majority :) CHANGED
Changed dwc:genus to dwc:genericName throughout this test in line with recent changes to Darwin Core.
As noted by @tucotuco the acceptance of https://dwc.tdwg.org/terms/#dwc:genericName resolves the potential ambiguity of dwc:genus with it's definition as the generic placement in the taxonomy from dwc:genericName as a parse of the first word of the scientific name.
In the Expected Response ..."COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with the atomic parts dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet" do we need the words "with the atomic parts"
Would not:
"COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet"
be sufficient
Are all happy with the specifications on this one now?
Getting this 'on the record' for all to consider: Email with @chicoreus yesterday. I suggested for the Expected Response-
INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is EMPTY, or all of dwc:genericName, dwc:specificEpithet and dwc:infraspecificEpithet are EMPTY; COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with NOT_EMPTY values of dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet; otherwise NOT_COMPLIANT.
@chicoreus response: "That is more explicit that the current separate (and not formalized yet) general guidance on handling "consistent". But if we we are explicit in this way here, we may need to be in other tests invoking "consistent"."
Thoughts?
I like it, but not sure of other implications
After discussion on the Zoom today, we agreed that using the current Test Data format for examples would seem expedient. We also previously agreed that a "COMPLIANT" and "NOT_COMPLIANT" or equivalents was appropriate.
I think the examples of INTERNAL/EXTERNAL_PREREQUISITES_NOT_MET would be overkill here?
What I have added in Examples is a for a check on formatting.
I don't see how the test can accommodate for interpolated names part of a polynomial dwc:scientificName. Polynomials with interpolated names: Aus (Bus) cus, where Bus is a subgenus Aus (cus) dus, where cus is a superspecies