ontology-metadata icon indicating copy to clipboard operation
ontology-metadata copied to clipboard

Unify dc:creator oio:created_by and dc:contributor, IAO:term editor

Open matentzn opened this issue 3 years ago • 40 comments

Is there any way we can, OBO wide, agree to

  • move to dc:creator with orcids as values OR
  • move to oio:created_by with orcids as values

and

  • agree that dc:contributor should always refer to a valid orcid?

@cthoyt

matentzn avatar Mar 08 '21 22:03 matentzn

I'm not familiar with the difference in semantics of dc:creator and dc:created_by. Does one refer to a resource and the other a literal? Because it would be great to refer to ORCID identifiers as resources.

Either way, 100% support using structured information as attribution. It's very disconcerting reading through such high quality resources and finding somebody's initials that take 2 hours to look up by reading old papers. This has happened to me in GO, MONDO, and others

cthoyt avatar Mar 08 '21 22:03 cthoyt

Thats because it was a typo.. Sorry about that. Fixed now. oio:created_by!

matentzn avatar Mar 08 '21 22:03 matentzn

Okay, then rephrased: I'm not familiar with oio - but since DC is so ubiquitous, I'd vote for using that (unless the semantics of oio:created_by are more suggestive for relations between resources instead of just text)

cthoyt avatar Mar 08 '21 22:03 cthoyt

oio stands for oboInOwl and is basically the OBO format internal vocabulary namespace. You have been using oio:hasDbXref a lot!

matentzn avatar Mar 08 '21 22:03 matentzn

agree that dc:contributor should always refer to a valid orcid?

I think SHOULD not MUST is OK here but be prepared that there will be many violations. We have many ontologies that are decades old with contributions that predates ORCID. In some cases we have retrospectively tracked down historic contributors and rewired their contributor dbxref to an orcid, but this is not always possible. Many historic contributors still lack ORCIDs. I worry by saying SHOULD we generate a lot of busy work on resource poor ontologies that would be better spent elsewhere, or we just weaken the meaning of SHOULD to where it's meaningless.

cmungall avatar Mar 09 '21 19:03 cmungall

I would say SHOULD is good and we just agree on using orcids moving forward.. I don't think its busy work. If we could use this consolidated way of attributing to generate a dashboard that makes individuals contributions to ontologies other than their own more visible, this will be a very great incentive!

matentzn avatar Mar 09 '21 19:03 matentzn

When adding terms in Protege if you use the new entities metadata settings to automatically add creator and date information to new terms, the default setting is for creator (see image). If we are not going to settle on 'creator' it would be good to ask protege to change the default setting to whatever we settle on. creator_metadata

sbello avatar Mar 09 '21 20:03 sbello

@sbello thanks! Yes! And what would be even better if the protege config was a separate config file that could be reused across obo.. We are contemplating something like that at the moment!

matentzn avatar Mar 09 '21 20:03 matentzn

In reference to #76 maybe we should first gather the use cases for attributing terms.

I want to emphasise one more time how strongly I feel about OBO being a driving force in world-wide ontology standardisation efforts beyond the biomedical domain, and to do that, we need to cut back on some of our silo annotation properties in the OIO and IAO vocabularies in favour of more widely used ones, like dublin core, skos, void, and friends. Please open a new issue: "We should not re-use external vocabularies if it means even the slightest compromise" and provide your arguments to convince me otherwise. So yes, standardisation means that we may lose some subtle distinctions.

Here is how I would suggest we use the creation vocabulary. Please tell me what you think.

  • dc:creator: the person or group that is responsible for the ID (IRI) of the term coming into being. This is synonymous with oio:created_by. The primary use case of this annotation is attribution (not provenance).
  • dc:contributor: any person that contributed anything to a term (adding a synonym, label, etc). The primary use case of this annotation is attribution (not provenance).
  • dc:source: if a person (or group) invented something, i.e. a definition or something along these lines, they can be referenced as a source. The primary use case of this annotation is provenance, not attribution. We can consider using this for robot templates or dosdp template-based generations as well.
  • IAO:0000117 (term editor): @zhengj2007 points out that "the person who add the term in the OWL file may not be the creator of the term" - while true, I would argue this is a distinction that is so subtle that it would not help with neither provenance, nor attribution. I would suggest to use dc:source or dc:creator, whichever is more appropriate from the definitions above.
  • oio:created_by means the same as dc:creator above, and should be retired.
  • The range of any of the above should, be one of the following, sorted from most to least desireable:
    1. ORCiD
    2. ROR
    3. Wikidata Identifier
    4. ..... huge threshold of desirability....
    5. http://purl.obolibrary.org/obo/mondo#CJM (this is just a hack to contextualise the current "CJM used by ontologies like GO).
    6. "Chris Mungall"
    7. "CJM"

I am not saying to change all legacy annotations now to this: I am saying, let's find a standard we can use moving forward, or agree that standardising this is not worth the cost.

matentzn avatar Nov 03 '21 10:11 matentzn

Wouldn't the semantics behind IAO:0000117 be sufficiently provided, if each term has it's own issue (using IAO_0000233 - term tracker item) that is properly assigned to be handled by the "term editor(s)"?

StroemPhi avatar Nov 03 '21 10:11 StroemPhi

I totally agree. I would love making this standard habit, tagging all new terms with their respective github issues.. It would create a layer of indirection, for obtaining the "responsible editor", but I think this much better than using non standard properties for something like that..

matentzn avatar Nov 03 '21 11:11 matentzn

@matentzn can these annotations be added when using ROBOT templates? I like the creator/contributor/source trio ideally in combination with an ORCID but it would be helpful if I could include this information in ROBOT templates for bulk addition. Would it be as simple as adding columns for this attributes?

sbello avatar Nov 03 '21 13:11 sbello

Absolutely no problem! :)

matentzn avatar Nov 03 '21 13:11 matentzn

@matentzn I'd like to correct my comment. I never used 'dc:creator' when I added a new term. So, what I mean is "the person who add the term in the OWL file may not be the IAO: 'term editor' of the term".

zhengj2007 avatar Nov 03 '21 13:11 zhengj2007

I get it now @zhengj2007 thanks! But perhaps that is secondary. In this case of ambiguity, you could simply use dc:contributor which is certainly true, right?

matentzn avatar Nov 03 '21 14:11 matentzn

I like Nico's breakdown, and would add to it that essentially dc:creator is_a dc:contributor. And the way we have been using 'term editor' is essentially what dc:contributor is. Furthermore, it can be hard / unfair to try to distinguish who is the creator, in so far as sometimes a term gets added to an ontology with placeholder (or empty) definitions etc. by person A, and person B puts in a lot more effort providing those. So I would favor just sticking to dc:contributor by default.

On Wed, Nov 3, 2021 at 6:51 AM jie zheng @.***> wrote:

@matentzn https://github.com/matentzn I'd like to correct my comment. I never used 'dc:creator' when I added a new term. So, what I mean is "the person who add the term in the OWL file may not be the IAO: 'term editor' of the term".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/60#issuecomment-959129077, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IT7IE7EL27JWECYL4LUKFD6LANCNFSM4Y2HJMDQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

bpeters42 avatar Nov 03 '21 14:11 bpeters42

Emails crossed.I was essentially trying to say the same things as Nico.

On Wed, Nov 3, 2021 at 7:06 AM Bjoern Peters @.***> wrote:

I like Nico's breakdown, and would add to it that essentially dc:creator is_a dc:contributor. And the way we have been using 'term editor' is essentially what dc:contributor is. Furthermore, it can be hard / unfair to try to distinguish who is the creator, in so far as sometimes a term gets added to an ontology with placeholder (or empty) definitions etc. by person A, and person B puts in a lot more effort providing those. So I would favor just sticking to dc:contributor by default.

On Wed, Nov 3, 2021 at 6:51 AM jie zheng @.***> wrote:

@matentzn https://github.com/matentzn I'd like to correct my comment. I never used 'dc:creator' when I added a new term. So, what I mean is "the person who add the term in the OWL file may not be the IAO: 'term editor' of the term".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/information-artifact-ontology/ontology-metadata/issues/60#issuecomment-959129077, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IT7IE7EL27JWECYL4LUKFD6LANCNFSM4Y2HJMDQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

bpeters42 avatar Nov 03 '21 14:11 bpeters42

Yes, I agree with that as well.. dc:contributor should be the default, and, realistically given that ontologies are always a massively collaborative effort, I would even agree to a motion that gets rid of dc:creator altogether. Thank you @bpeters42 for your input :)

matentzn avatar Nov 03 '21 15:11 matentzn

It looks like I can change the user metadata in protege to use whatever relation we decide in the creator property field. So, if the group wants to go with contributor instead of creator I'm fine with that.

sbello avatar Nov 03 '21 16:11 sbello

FWIW, I've been setting the "New entities metadata" to "Use user name.

image

But, in my "User details" setting, I include my name and ORCID.

image

I like having both a name and ORCID, since I don't have people's ORCIDs memorized.

wdduncan avatar Nov 03 '21 21:11 wdduncan

I agree with Nico's recommendations.

What this doesn't address is how this interacts with definition level axiom annotations (done using owl reification). It's very common on many ontologies to provide as provenance for a definition some mix of primary, secondary, tertiary sources, individuals, and groups of people.

How should this interact with term-level source and contributor annotations?

  1. Favor term-level annotations over axiom-level
  2. Favor axiom-level and only include term-level if non-redundant
  3. Have redundancy in the release version, non-redundancy in edit version, and a standard sparql update to propagate selectively from axiom-level to term level as part of the release process
  4. No recommendation. Every ontology does this as it pleases

I favor 3, and disfavor 1, it is important for many ontologies to have the provenance at the axiom level.

cmungall avatar Nov 03 '21 23:11 cmungall

I think I agree, it isn't clear what IAO:0000117 (term editor) adds to the others, nor which of the others it truly represents (but I infer 'creator' from the description), and therefore it is less helpful to the average non-OBO user. (if that's a user you're trying to reach, that's a good thing I think.)

Some nuances in case they are useful.

Is making at least one dc:contributor required, but making dc:creator optional consistent with both your idea of compromise and the previous comments?

Note there is no reason people and institutions can't both be contributors/creators/etc on one term. Right?

presumably dc:source can also be a place (location on the web), not just a person or group.

I think you've dropped a few person identification systems that have some scientific following and are LOD-friendly (FOAF, VIVO). Whereas I'm not sure why you'd include 4 through 7, given this is a future-looking recommendation.

graybeal avatar Nov 04 '21 06:11 graybeal

I agree full-heartedly with your assessment @graybeal , the reason why I added these three purely because I want to void pushback from GO which has used 4-7 for 30 years and will now be resistant to retro-curate all the various cjms and others to orcids.. Maybe I will volunteer doing it for them one weekend - if we can agree that orcid is the preferable identification. If someone has no orcid, I would follow the radical @cthoyt method of simply creating an entity on wikidata and use that, and I prefer that then to use FOAF or VIVO, because we know easily how to edit it. But, yes, FOAF and VIVO would still be better than 4-7.

matentzn avatar Nov 05 '21 12:11 matentzn

Related to https://github.com/information-artifact-ontology/ontology-metadata/issues/2

matentzn avatar Dec 20 '21 17:12 matentzn

@cthoyt Does it help your script if I reverse how I do my dc:creator annotations so that the orcid comes first? E.g.:

https://orcid.org/0000-0001-9625-1899 (Bill Duncan)

wdduncan avatar Dec 20 '21 17:12 wdduncan

It is not @cthoyt script that is the only problem: we want to simply aggregate contributions across all ontologies using sparql. The labelling approach you chose is not well defined, everyone will do it differently. If we want human readable editor names as well, we should provide a map in the ontology header.

matentzn avatar Dec 20 '21 17:12 matentzn

What is not well specified about the example? Are you wanting something like a regex? How about this:

{orcidid} *({first-name last-name},+)

I.e.: an orcid followed by an optional set of one or more comma delimited names contained within parenthesis.

I don't like the idea of putting a map in the header. It makes people go looking for the name associated with the orcid.

wdduncan avatar Dec 20 '21 17:12 wdduncan

I agree with Nico; there's no useful, machine-readable attribution via dc:creator that isn't structured by directly and only using the IRI for the ORCID record.

I can't see how adding a mini-language within the OWL spec would be helpful, I'd strong disagree with anything that isn't just using the ORCID IRI for attribution purposes.

With regard to ease of access to human-readable names for contributors, I think that's a different conversation that has to happen somewhere else at a later time, after first getting a consensus that people would generally actually use this human-readable metadata

cthoyt avatar Dec 20 '21 17:12 cthoyt

First-name and last name is super error-prone no matter what (what about middle initials, special chars for Spanish etc). I am thinking more of how this would feed into knowledge graphs like wikidata, where your name would be a label on your ID, and how all the information OBO generates get connected to the wider world - the orcid here really is not a literal, but a node in a graph. We can always generate Human readable labels from the ORCIDs automatically if people would ask for it, and connect that using a different property.

matentzn avatar Dec 20 '21 18:12 matentzn

What is the benefit vs cost of using only orcids? In what I proposed you (or a script) can ignore ignore whatever comes after the orcid. People (or at least me) read names, not orcids.

I'm trying to find a compromise. But, you don't seem open to such a compromise.

wdduncan avatar Dec 20 '21 18:12 wdduncan