bdq
bdq copied to clipboard
TG2-VALIDATION_TAXONID_COMPLETE
Field | Value |
---|---|
GUID | a82c7e3a-3a50-4438-906c-6d0fefa9e984 |
Label | VALIDATION_TAXONID_COMPLETE |
Description | Does the value of dwc:taxonID contain a complete identifier? |
Output Type | Validation |
Darwin Core Class | Taxon |
Information Elements | dwc:taxonID |
Expected Response | INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY; COMPLIANT if (1) taxonID is a validly formed LSID, or (2) taxonID is a validly formed URN with at least NID and NSS present, or (3) taxonID is in the form scope:value, or (4) taxonID is a validly formed URI with host and path where path consists of more than just "/"; otherwise NOT_COMPLIANT |
Data Quality Dimension | Completeness |
Term-Actions | TAXONID_COMPLETE |
Warning Type | Incomplete |
Parameter(s) | |
Source Authority | |
Examples | [dwc:taxonID="urn:lsid:zoobank.org:act:17ADF24F-027F-44F6-9543-D3D0260CE79E": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:taxonID contains a URI and a namespace indicator"] |
[dwc:taxonID="Hakea decurrens ssp. physocarpa": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:taxonID does not contain a URI"] | |
Source | TG2-Gainesville |
References | |
Example Implementations (Mechanisms) | |
Link to Specification Source Code | |
Notes | The original test "VALIDATION_TAXONID_AMBIGUOUS" was seen by the TG2 team as too complex to implement. If we use any single bdq:sourceAuthority such as GBIF, a valid and complete dwc:taxonID based on an alternative source authority is unlikely to provide a valid match. A text or number string as a namespace indicator without a URI will be ambiguous. As an example, GBIF's backbone taxonomy dataset can be found at https://doi.org/10.15468/39omei. |
Agreed at TDWG 2018 DQIG meeting that any mention of uniqueness is redundant with the resolvability requirement, hence references to uniqueness were dropped.
We currently have in the notes: "Note that the cause of failure may be due to a service failure. Implementations of this test should account for this type of failure and not necessarily report a failure."
Should this then be covered by adding an EXTERNAL_PREREQUISITES_NOT_MET?
This would apply to any external lookup. One presumes any system failure would generate a specific response like "FAILED_LOOKUP"?
@ArthurChapman yes, EXTERNAL_PREREQUISITES_NOT_MET would cover reporting some sort of transient system failure where asking the same question later might get an answer. @Tasilee Failed_Lookup has ambiguity to it - it carries the potential implication that a lookup was run (and failed), and that something was looked up. EXTERNAL_PREREQUISITES_NOT_MET covers the more general case of some external resource (lookup, calculation, or otherwise) was not available, try again later.
@Tasilee and I have a problem with this one. How do we resolve the TaxonID. The examples given in Darwin Core include a GUID and just a number ("32567") which is similar to our example of a failure. How is it possible for us to Validate - unless it references an authority - which according to Darwin Core is not the case. I don't see how this can work. @tucotuco, @chicoreus is this possible to do? Is it a valuable test?
The best practices for identifiers says they should be globally unique for the instance of the Class they represent, persistent, and resolvable. That is an applicability statement apart from Darwin Core. In Darwin Core, or in a Darwin Core Archive, there are no such restrictions. This shouldn't be too disturbing, as Darwin Core does not implement restrictions in and of itself, it merely provides definitions and other guiding information. So, the problem, if it were one, would not be unique to the dwc:taxonID term. What does seem to be a problem is that, if the taxonID does not contain the information to resolve it (the authority), that is an internal prerequisite that isn't met - there is a problem with the data rather than a problem with a service. That is not captured in the Expected Response.
Are we saying that the Expected response should be "EXTERNAL_PREREQUISITES_NOT_MET if resolving service was unavailable; INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is either not present or is EMPTY or is not resolvable; COMPLIANT if the value of the field dwc:taxonID is resolvable; otherwise NOT_COMPLIANT" given @chicoreus comment on EXTERNAL and @tucotuco on INTERNAL?
I wouldn't think so - as if it is non-resolvable it is NOT_COMPLIANT. What John is saying is that it requires somewhere in the record a reference to what the resolving authority is. I think we are saying "EXTERNAL_PREREQUISITES_NOT_MET if resolving service was unavailable; INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is either not present or is EMPTY or the resolving authority is not determined; COMPLIANT if the value of the field dwc:taxonID is resolvable; otherwise NOT_COMPLIANT" or some similar word to "not determined" (not identifiable, not known, not referenced within the record)
I agree with @ArthurChapman that the response should be NOT_COMPLIANT if the taxonId is not resolvable, but I would not expect the authority information to be anywhere else in the record than in taxonId. It would be resolvable if it was possible to directly (full URI) or indirectly (unambiguous namespace from which full URI could be constructed) resolve the taxonId.
OK, so is the Expected Response now ok?
I think it is OK - but may be better (given what @tucotuco said above) if we said "INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is either not present or is EMPTY or the resolving authority is not referenced within the record" What do you think @tucotuco ?
I would be specific, INTERNAL_PREREQUISITE_NOT_MET if the field dwc:taxonID is either not present or is EMPTY or does not include the resolving authority.
Thanks @ArthurChapman and @tucotuco - done.
@tucotuco how about a taxon in the form urn:uuid:e34fda24-f53e-4627-b591-b6c6ca349293 that should be an unambiguous unique taxonID, with a known urn scheme, just not resolvable. Or, e34fda24-f53e-4627-b591-b6c6ca349293? I'd tend to think that this test is for uniqueness, not necessarily resolvability. Would the requirement be any urn:uuid, urn:catalog, lsid:, http:, https: identifier?
@chicoreus That may be a GUID. It is in the form of a GUID. But no one can resolve it to know for sure. If it resolves in addition, you can be sure it is a GUID. But these are just my perspective. Darwin Core doesn't require anything in particular, so it comes down to what we want the test to do everywhere.
Not sure of the wording here.
"... INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY or does not include the resolving authority ..."
The example has just "dwc:taxonID=54367" i.e. is just a number but does NOT include a resolving authority - so as written would be (at least to me) - INTERNAL_PREREQUISITES_NOT_MET
Also none of the examples in the test dataset include "the resolving authority"
With all these we need to either 1) delete the worlds "or does not include the resolving authority" or 2) modify all our examples
@ArthurChapman I'd agree. I'd concur with deleting the phrase "or does not include the resolving authority" from the specification. But, there is likely more work required.
urn:lsid:marinespecies.org:taxname:406150 is a likely, unique, valid, non ambiguous value for taxonID.
Given the specification of "GBIF backbone taxonomy service", there isn't actually a way of querying that service for a taxonID, e.g. https://api.gbif.org/v1/species/search?taxonID=urn:lsid:marinespecies.org:taxname:406150&datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c ignores the invalid term taxonID= and just returns everything in the backbone taxonomy.
Taxon records in the backbone taxonomy do include taxonID, so an implementation which works off of a download from GBIF would work, but I'd be hard pressed to implement this as defined, unless we assert that the only non-ambiguous taxonID values are identifiers of records in GBIF's backbone taxonomy, thus, https://api.gbif.org/v1/species/2435099 and 2435099 would both be compliant, but the quite unambigous urn:lsid:marinespecies.org:taxname:406150 as we can't find it through the GBIF service, would be ambiguous.
Noting that https://api.gbif.org/v1/species/54367 currently does not return any results, suggesting that either GBIF deleted the record, or 54367 is ambiguous as we don't know which dataset it belongs to....
urn:lsid:marinespecies.org:taxname:406150 Is unambigous. Is NOT resolvable (thus fails on that part of the specification). Is NOT findable through the GBIF backbone taxonomy service (thus fails on that part of the specification). Includes an authority, but not a resolving authority (thus fails on that part of the specification).
In retrospect, I'm wondering why the Expected Response specifically uses "GBIF backbone taxonomy service" and yet we refer to "resolving authority" and don't have a Parameter? Aren't we going to have national authorities scenarios?
I'd certainly agree in deleting the phrase "or does not include the resolving authority". (DONE)
Note that the Darwin Core examples for taxonID are "8fa58e08-08de-4ac1-b69c-1235340b7001, 32567, https://www.gbif.org/species/212"
I think we definitely need some more discussion on this test
One more: I seem to remember @tucotuco saying we shouldn't use "bdq:sourceAuthority service" so I have been removing these from the Expected Responses. What about this one though?
Though this might be a case where the implementor is more likely to use a service, there is no requirement to do so, so wouldn't it be the same?
Happy with that.
Discussion of the TG2 team 7th March 2022 suggested that this test was too complex to implement with due utility. Consequently, it was suggested that we rename it as an 'INCOMPLETE' type test of dwc:taxnID with compliance only if both a URI and suffix (? a better term?) were present.
@tucotuco 's "namespace indicator" to replace suffix" seems good to me.
Are we all happy with this test as it stands now?
On trying to implement this, finding the specification wanting.
Currently: "Description: Does the value of dwc:taxonID contain both a URI and namespace indicator?" Currently: "Expected Response: INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY; COMPLIANT if dwc:taxonID contains both a URI and a namespace indicator; otherwise NOT_COMPLIANT"
Propose the following specification: COMPLIANT if (1) taxonID is a validly formed LSID, or (2) taxonID is a validly formed URN with at least NID and NSS present, or (3) taxonID is a validly formed URI with host and path where path consists of more than just "/", and if host is www.gbif.org and the path begins with "/species/", the path contains additional trailing characters; otherwise NOT_COMPLIANT
Here the semantics of LSID are valuable, for to be validly formed, a LSID must specify the authority, namespace, and objectID - which is really what we want to know in this test, can we tell what the taxonID reference is and what it is referring to, while for http:/https URIs, the path can contain the equivalent of the lsid namespace and the lsid objectID, as in https://www.gbif.org/species/2529789, where https://www.gbif.org/species/ is a validly formed URI that needs special case handling to tell that it doesn't actually contains a reference to a particular taxon. The specification could include additional common special cases (e.g. URIs with a path containing aphia.php and query containing id=), or not.
An informative comment from @timrobertson100 19th September 2022:
"When it comes to occurrence record processing, the GBIF occurrence systems currently pass this value on, only making use on the literal values (e.g. scientificName) so it’s not something we’d have a very strong an opinion on, in e.g. a spreadsheet. My gut feeling is a “scope:value” format (e.g. gbif:1234) is better than a URL, for the reason that URLs are generally less stable over time. As an example. “species” in that URL is already questionable and a future GBIF API would be better using e.g. “../taxon/..” and concept based identification of organisms".
How about, (taking in Tim And Markus' comments on scope:value): COMPLIANT if (1) taxonID is a validly formed LSID, or (2) taxonID is a validly formed URN with at least NID and NSS present, or (3) taxonID is in the form scope:value, or (4) taxonID is a validly formed URI with host and path where path consists of more than just "/"; otherwise NOT_COMPLIANT
@chicoreus - see comment and question under #71
@ArthurChapman and I have re-read the Expected Response and we realize that we will need to better handle the terms LSID, URN, NID and NSS and possibly URI.
Do we expand it in the test? Do we simply add a reference? Do we add the terms to the Vocabulary?