dwc icon indicating copy to clipboard operation
dwc copied to clipboard

Revisiting comments on dwc:recordedBy, dwc:identifiedBy, dwc:georeferencedBy

Open dshorthouse opened this issue 2 years ago • 2 comments

The comments on the above terms recommend the use of pipes ( | ) to separate the values in a list. I wish to raise an observation about this recommendation and urge that these comments be removed or replaced with something far less syntactically stringent. The root of my concern is whether or not purported items in a value for this term can in fact be cleanly represented as a list. My argument here is that no, they cannot. Forcing them into a list of units introduces unintended bias.

Although none of these terms make any mention of verbatim (due in part to the philosophical rabbit holes that such a word has us tumble down), neither do they recommend that these terms convey the identity of people or organizations. Nonetheless, recommending that pipes be used to separate items in string as through they were a list does in fact nudge us down that path. Rarely, if ever would one see a pipe separating members of a team on a collecting label. Rarely, if ever would such values be expressed as seemingly formal Western structures as provided in the comment like Oliver P. Pearson | Anita K. Pearson. The recommendation in the comments for these terms gives the impression that values are best computed; some form of local disambiguation activity is advisable. None of the definitions or recommendations suggest that shared content here be factual representations. As a result, collection managers who use a relational data management systems may be less inclined to record what is written on a label because it's perceived as having little downstream value – the use of pipes suggests someone has a purpose for the contained parts who does not know how to deal with other separators – favouring instead the use of a computed name(s) held elsewhere in their system. This is a mistake.

Values for dwc:recordedBy and dwc:identifiedBy should be absent any implicit statement about identity that artificial separator characters like pipes introduce. When presented with examples like, Dr. & Mrs. John Smith on a collector label, do we eschew the recommendation or do we construct it like, Dr. Smith | Mrs. John Smith or Dr. John Smith | Mrs. John Smith or Dr. John Smith | Mrs. ? Smith or John Smith | ? Smith or simply John Smith (it's all too common that the "Mrs." is entirely dropped)? Do we construct an awkward group if there is such an object type in one's collection management system? All of these are perfectly possible, but their implementations and expressions depend on one's familiarity with Western names. Likewise, it would appear exceedingly bizarre to some if ampersands were arbitrarily replaced by pipes for the purposes of publishing data as Dr. | Mrs. John Smith. Note that all are semantically different from the original form, which are likely to result in differing disambiguation routines should these occur outside the walls of the collection management system. Similarly, there are many examples of collector names written in native languages on labels whose separators might be 'e' or other. Introduction of pipes in their place might sway a collection manager to use a canonical, managerial, transliterated/translated form of these names. In short, pipes introduce unintended, cultural bias when it is likely that their purpose was to remove such biases. If it is truly identity we wish to convey, we have the terms dwc:recordedByID and dwc:identifiedByID for this very purpose. So...I do not know what clarity of purpose pipes serve in the exchange of occurrence records that contain these two terms.

dshorthouse avatar Jun 05 '23 18:06 dshorthouse