bdq icon indicating copy to clipboard operation
bdq copied to clipboard

TG2-VALIDATION_OCCURRENCEID_STANDARD

Open iDigBioBot opened this issue 7 years ago • 3 comments

Field Value
GUID 3cfe9ab4-79f8-4afd-8da5-723183ef16a3
Label VALIDATION_OCCURRENCEID_STANDARD
Description Does the value of dwc:occurrenceID occur in bdqSourceAuthority?
Output Type Validation
Darwin Core Class Occurrence
Information Elements dwc:occurrenceID
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:occurrenceID is EMPTY; COMPLIANT if the value of dwc:occurrenceID follows a format commonly associated with globally unique identifiers (GUIDs); otherwise NOT_COMPLIANT
Data Quality Dimension Conformance
Term-Actions OCCURRENCEID_STANDARD
Warning Type Invalid
Parameter(s)
Source Authority
Examples [dwc:occurrenceID="https://www.inaturalist.org/observations/43047701": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:occurrenceID conforms to GUID structure"]
[dwc:occurrenceID="42": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:occurrenceID does not conform to GUID structure"]
Source VertNet
References
  • Darwin Core RDF Guide (2015). #1.3.2.1 Persistent Identifiers (normative) (https://rs.tdwg.org/dwc/terms/guides/rdf/index.htm#1.3.2.1_Persistent_Identifiers)
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes

iDigBioBot avatar Jan 05 '18 15:01 iDigBioBot

Comment by Paul Morris (@chicoreus) migrated from spreadsheet: This one feels like it should assert that the occurrenceID conforms to a well known IRI scheme, and should reference the Darwin Core RDF guide. http://rs.tdwg.org/dwc/terms/guides/rdf/index.htm#1.3.2.1_Persistent_Identifiers </PJM><PJM>I'd support: OCCURRENCEID_CONFORMS_TO_WELLKNOWN_IRI_SCHEME. See http://rs.tdwg.org/dwc/terms/guides/rdf/index.htm#1.3.2.1_Persistent_Identifiers

iDigBioBot avatar Jan 05 '18 15:01 iDigBioBot

@ArthurChapman two typos in 'uninique'

JoannaMcCaffrey avatar Jan 08 '18 16:01 JoannaMcCaffrey

Thanks Joanna - fixed

ArthurChapman avatar Jan 09 '18 22:01 ArthurChapman

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted"

Tasilee avatar Sep 16 '23 03:09 Tasilee

Description doesn't conform to the rest of the test. There is no source authority.

Specification needs further work as well.

chicoreus avatar Feb 24 '24 19:02 chicoreus

RDF Guide asserts that identifiers "SHOULD be globally unique, referentially consistent, and persistent" And /'if those identifiers are to be used to identify subject resources in RDF, they MUST also be in the form of an IRI."

This test can't readily assess the first three criteria (globally unique, referenially consistent (always returns the same object), and persistent), thought it can assess the later.

Better is probably to cite recommendation R1 from the guid applicability statement https://github.com/tdwg/guid-as/blob/master/guid/tdwg_guid_applicability_statement.pdf, recommending a PURL, LSID, UUID, DOI, other Handle, or http URI. These have patterns that could be detected by this test.

chicoreus avatar Feb 24 '24 21:02 chicoreus

However Darwin Core doesn't require it to be globally unique, so probably needs further looking at

From DwC: An identifier for the dwc:Occurrence (as opposed to a particular digital record of the dwc:Occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the dwc:occurrenceID globally unique.

ArthurChapman avatar Feb 24 '24 21:02 ArthurChapman

@ArthurChapman the "persistent global unique identifier," or "combination of identifiers in the record" are untenable to either write a specification for a test or to implement a test.

This one we would be much better off adopting recommendation R1 from the guid applicability statement. That is tenable to write a specification for and to write code to evaluate.

This is very similar to the problem we faced in #121 and #212

chicoreus avatar Feb 25 '24 23:02 chicoreus

Thanks @chicoreus. I agree the GUID Applicability Statement is a fair base (Source Authority?), but what about the complexity of implementing #115?

Tasilee avatar Mar 01 '24 00:03 Tasilee

Thank you Lee for flagging this in your email. I have mixed feelings on this test. Please see the discussions on this thread: https://github.com/tdwg/dwc/issues/491

ymgan avatar Mar 21 '24 05:03 ymgan

The recommendations for the term "Recommended best practice is to use a persistent, globally unique identifier" cannot be tested. I would put this into the "DO NOT IMPLEMENT" category.

tucotuco avatar Mar 21 '24 21:03 tucotuco

I agree with @tucotuco suggestion. i.e. DO NOT IMPLEMENT

ArthurChapman avatar Mar 21 '24 23:03 ArthurChapman

I mostly concur. The broad scope of the recommended best practice in the term itself is not practical to implement.

However, this is essentially the same test as VALIDATION_SCIENTIFICNAMEID_COMPLETE #212 (and supplementary #121) but performed on a different darwin core term with a different definition. Since we are supporting that test, in a form that is implementable, I suggest we frame a more precisely parallel test, VALIDATION_OCCURRENCEID_COMPLETE that tests for conformance to recommendation R1 from the guid applicability statement https://github.com/tdwg/guid-as/blob/master/guid/tdwg_guid_applicability_statement.pdf, recommending a PURL, LSID, UUID, DOI, other Handle, or http URI. These have detectable patterns that could be validated by this test. Like #212, this would be somewhat aspirational (and might go in supplemental rather than core).

If we don't we need to be very explicit about why we put a do not implement on #23 and have #212 in core (and #121 in supplementary).

The difference might be that scientific name id is expected to point to someone else's identifier, and occurrence id is very likely to be minted locally by the database of record for the occurrence. But, we need to be very explicit about why one and not the other., or we need a paralell test VALIDATION_OCCURRENCEID_COMPLETE based on the RDF guide.

@Tasilee I'm not understanding the reference to #115.

chicoreus avatar Apr 02 '24 00:04 chicoreus

This one would be VALIDATION_OCCURRENCEID_COMPLETE, proposed, but not specified is VALIDATION_OCCURRENCEID_STANDARD that would operate on the dwciri term.

chicoreus avatar Jul 26 '24 23:07 chicoreus

Perhaps I misunderstand, but there is no dwciri: equivalent of dwc: Id terms. Instead, identification of a subject in RDF is recommended to be accomplished with the rdf:about attribute of the rdf:Description element.

tucotuco avatar Jul 27 '24 01:07 tucotuco

You are right. No need for a VALIDATION_OCCURRENCEID_STANDARD test.

chicoreus avatar Jul 27 '24 01:07 chicoreus