OBOFoundry.github.io icon indicating copy to clipboard operation
OBOFoundry.github.io copied to clipboard

Create guidelines for OBO maintainers who want to be included in Wikidata

Open cmungall opened this issue 9 years ago • 49 comments

Most OBOs are CC-BY, Wikidata requires CC-0. Some ontologies have apparently granted Wikidata permission to redistribute part or all of their ontology.

We want to make sure this is streamlined with a common process for everyone. Not clear to me how this should be done, ideas welcome, add below.

cmungall avatar Jul 12 '16 23:07 cmungall

CC-by but attribution chosen is by PURL - as long as wikidata uses the PURL (perhaps replacing the URL formatter that currently links directly to AmiGO) I think it should be ok.

mcourtot avatar Jul 13 '16 08:07 mcourtot

I agree that this needs to be written down somewhere. It would make everything much clearer and we could avoid getting into lengthy discussion later down the line, or even have to remove data, if something were to happen.

elviram avatar Jul 13 '16 13:07 elviram

re: @mcourtot's comment, it's true that Wikidata probably generally satisfies the attribution requirement for CC-BY. But Wikidata itself is CC0, so if you grant Wikidata permission to use your data, then you also grant downstream users those same CC0 terms. So someone who downloads your ontology via Wikidata would not be required to attribute in any way.

andrewsu avatar Jul 13 '16 15:07 andrewsu

On 13 Jul 2016, at 8:39, Andrew Su wrote:

re: @mcourtot's comment, it's true that Wikidata probably generally satisfies the attribution requirement for CC-BY. But Wikidata itself is CC0, so if you grant Wikidata permission to use your data, then you also grant downstream users those same CC0 terms. So someone who downloads your ontology via Wikidata would not be required to attribute in any way.

Additionally, this points to an assumption in the OBO license that assumes a PURL for every unit of attributable work. What if I want to produce an ontology that is purely axioms on an existing ontology? This could be logical axioms (e.g. providing equivalence axioms for DO, or logical definitions for CHEBI roles) or annotation axioms (e.g. a translation to another language)? If I want to ensure attribution I would have to add PURLs for every axiom, which can be a high overhead.

cmungall avatar Jul 13 '16 16:07 cmungall

Hi, this is Nuria a new post-doc in Andrew's lab. IMO, CC-BY license is a common license option for data providers groups in order to give visibility to their resources and to demonstrate their use by the community to funding agencies. Initiatives on the development of quality and resource use metrics in ELIXIR and NIH are ongoing to support decision-making in funding agencies. A win-to-win idea would be to suggest Wikidata to ELIXIR/NIH as one metrics component to compute resource use by the community. In this way, Wikidata could be used by funding agencies as a data endpoint to evaluate and identify relevant resources for the community, and could be used by data providers as a platform to make widely visible and available their resources to both the community and funding agencies. In this way, we will foster data providers to grant Wikidata permission for data sharing under CC0 license. I am not sure now if in Wikidata could be shown rankings such as number of downloads per year of ontologies, or number of citations per ontologies...

NuriaQueralt avatar Jul 14 '16 06:07 NuriaQueralt

Hi Nuria, I have been in a quite a few Wikidata/Wikipedia meetings and the one thing that is mentioned over and over is that they do not keep track of users and data. It is all in the spirit of open and free data. What is doable, is that we can look at how many times an item or property is used. And there is always the possibility to point at how many times Wikipedia and Wikidata are accessed. I am all for your suggestion about ELIXIR/NIH.

What we could do with the licensing issues is draft a general agreement for OBO ontologies and their free use in Wikidata. Anybody who would want their ontology in Wikidata would have to sign/agree to it.

Cheers, Elvira

On Thu, Jul 14, 2016 at 2:27 AM, Núria Queralt Rosinach < [email protected]> wrote:

Hi, this is Nuria a new post-doc in Andrew's lab. IMO, CC-BY license is a common license option for data providers groups in order to give visibility to their resources and to demonstrate their use by the community to funding agencies. Initiatives on the development of quality and resource use metrics in ELIXIR and NIH are ongoing to support decision-making in funding agencies. A win-to-win idea would be to suggest Wikidata to ELIXIR/NIH as one metrics component to compute resource use by the community. In this way, Wikidata could be used by funding agencies as a data endpoint to evaluate and identify relevant resources for the community, and could be used by data providers as a platform to make widely visible and available their resources to both the community and funding agencies. In this way, we will foster data providers to grant Wikidata permission for data sharing under CC0 license. I am not sure now if in Wikidata could be shown rankings such as number of downloads per year of ontologies, or number of citations per ontologies...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/285#issuecomment-232568309, or mute the thread https://github.com/notifications/unsubscribe/AIB6SBYhq_6zMQX4hLkxl9ErhS6Q_wqlks5qVdbvgaJpZM4JK62o .

Elvira Mitraka, PhD Postdoctoral Fellow Institute of Genome Sciences University of Maryland, School of Medicine BioPark II, Office 664 [email protected] www.igs.maryland.edu

elviram avatar Jul 14 '16 13:07 elviram

I like the idea of standardizing this process. That being said, we have made significant progress working through the addition of one resource at a time and getting permission one at a time. So.. whilst negotiations for an OBO-wide pattern continue, if we want data (e.g. Henning's suggestion of Reactome) in Wikidata, lets go ahead and ask the owners directly.

goodb avatar Aug 01 '16 03:08 goodb

You know what would make this all go away? Making OBO Foundry require a CC0 license.

Lets try to answer the attribution problem with good software for tracking usage, not with lawyers writing text that is unenforceable for the bad guys and massively distracting (as demonstrated here) for the good guys.

To Elvira's point above. Actually Wikipedia/Wikidata does keep extensive logs on usage. I started a thread about gaining access to them for the purpose of building an attribution engine. Response was pretty positive, but I didn't have the bandwidth to follow it up.

goodb avatar Aug 01 '16 03:08 goodb

+1 for CC0. At least, I think it should be recommended more strongly (right now OBO recommends CC-BY).

cc @hlapp

balhoff avatar Aug 01 '16 18:08 balhoff

Is there a specific reason Wikidata can't accommodate CC-BY? Other than "they don't want to".

mcourtot avatar Aug 01 '16 21:08 mcourtot

This was a (good) decision made a long time ago and at a higher level than we are operating on here that is not likely to change. Adding technology to accommodate multiple licensing patterns in the same knowledge graph is not trivial and would be a distraction from the main objective.

On Mon, Aug 1, 2016 at 2:01 PM, Melanie Courtot [email protected] wrote:

Is there a specific reason Wikidata can't accommodate CC-BY? Other than "they don't want to".

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/285#issuecomment-236706625, or mute the thread https://github.com/notifications/unsubscribe-auth/AB_U6llYdIgjhbadn-Cahww0q9Nz_ZwEks5qbl7CgaJpZM4JK62o .

goodb avatar Aug 01 '16 21:08 goodb

You know what would make this all go away? Making OBO Foundry require a CC0 license.

I fought quite hard to even allow CC0 in the respective OBO Foundry principle. The recommendation is still CC-BY (for, IMHO, poorly motivated reasons).

One argument I made during the discussions leading up to that was that in particular because of the Realism principle espoused by OBO Foundry, most of the content of an OBO Foundry ontology will be unlikely to even satisfy as creative expression. Others, most prominently @alanruttenberg, argued against that, citing previous case law (of which there isn't much, but there is precedent of some ontology in some field having been ruled eligible for copyright protection).

IMO, the stronger argument (which I have also made) is that CC-BY as a legal instrument is the wrong tool to bring to bear for declaring an attribution requirement and demanding compliance with it. Attribution can be given in many different ways that all satisfy the legal requirement of a CC-BY license, but very few of which will satisfy the mechanism of attribution we as scientists really want. So it's really a social norm that we request compliance with, not a legal one, and so a CC-BY license, by itself, adds very little if anything to stating what we expect in return for reuse.

Bottom line, I remain entirely in favor of requiring, or at the very least strongly recommending, that OBO Foundry ontologies be released under a CC0 waiver.

hlapp avatar Aug 01 '16 23:08 hlapp

IMO, the stronger argument (which I have also made) is that CC-BY as a legal instrument is the wrong tool to bring to bear for declaring an attribution requirement and demanding compliance with it. Attribution can be given in many different ways that all satisfy the legal requirement of a CC-BY license, but very few of which will satisfy the mechanism of attribution we as scientists really want. So it's really a social norm that we request compliance with, not a legal one, and so a CC-BY license, by itself, adds very little if anything to stating what we expect in return for reuse.

Eloquently put, Hilmar. I could not agree more...

andrewsu avatar Aug 01 '16 23:08 andrewsu

:+1: for CC0, :-1: for CC BY

I think the OBO Foundry should strongly recommend CC0 and nudge ontologies to switch from CC BY to CC0 when possible. I'll start with the legalese that reusers are subject to under a CC BY 4.0 license:

Section 3 – License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the following conditions.

a. _Attribution._

  1. If You Share the Licensed Material (including in modified form), You must:

    A. retain the following if it is supplied by the Licensor with the Licensed Material:

    1. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
    2. a copyright notice;
    3. a notice that refers to this Public License;
    4. a notice that refers to the disclaimer of warranties;
    5. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;

    B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and

    C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.

  2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.

  3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.

  4. If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License.

The effect of Section 3 is uncertainty and drudgery. Did the the Licensor identify the creators? If so, you must retain it. Did the the Licensor supply a copyright notice? If so, you must retain it. Don't fail to mention if you modified the resource. Even if you license your derivative work under a compatible license such as CC BY-NC, you must still mention the original license. After reading these conditions, I think it's likely that my use of CC BY ontologies in Hetionet — an integrative network of biology — may not comply with the entirety of these CC BY conditions, even though I went to great pains trying to comply with the incredible burden laws and licenses place on publicly-funded data.

The best applications of knowledge will be integrative. Integrating CC BY content can be tricky because you must deal with multiple potentially-contradictory license conditions as well as attribution stacking. The amount of weird tricky situations that arise when you do even a little integration is astounding. Some CC BY resources will have Sui Generis database rights. Others will not. Most lawyers don't have the expertise to provide guidance on these issues and lawyers generally avoid giving advice unless contracted to do so. Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse. CC0 was designed to avoid uncertainty. The license is lengthly, but since the whole point is to make the content in the public domain, you don't have to worry about any conditions of reuse.

Legally-enforced attribution is overrated. Best practice requires establishing data provenance. Any high quality resource will attribute when that attribution is productive. Sometimes it's not productive to attribute. Sometimes it's destructive. For example, I created PharmacotherapyDB — a CC0 catalog of drug–disease treatments. The drugs are coded using DrugBank and the diseases are coded using the Disease Ontology. I don't want my users to be burdened by licensing and I want my data to be maximally reused, so I used CC0. But am I violating the Disease Ontology's CC BY License? I've created a derivate work that includes 97 DO terms, and these terms potentially represent an original work of authorship. Answering this question requires wading through legal precedent, which is an extreme burden. Much of this precedent is yet to exist: the space is filled with open questions. Sometimes it's nice to just use an identifier and not have to attribute anything. Identifiers usually have their provenance embedded anyways. Based on these considerations, DrugBank — a dually licensed (aka commercial) resource — released the core of their resource as CC0.

The aforementioned practice of granting WikiData permission to release data under CC0 but then officially releasing the same data under CC BY is not ideal. This will create confusion as it's unclear whether WikiData actually had sufficient permission to apply CC0. Users of WikiData content could be liable for violating upstream data licensing and many users won't want to take that risk. The authoritative source of the data should apply the most permissive license that the data is released under anywhere to avoid these situations. You also don't want two classes of users: those who access from the authoritative site and get the restrictive license and those who use WikiData. Finally, there's the possibility of a resource diverging, similar to the recent Ethereum hard fork. This could happen if WikiData is granted permission to reproduce an ontology at one point, but subsequent contributions are made under the CC BY license.

Finally, licenses and laws change over time. Currently, CC BY 4.0 is compatible with a broad range of licenses. However, incompatibilities may arise in the future. Let's create knowledge and content that withstands the test of time. From the perspective of a creator, I want to maximize the reuse of my creations. Most of us are in the incredibly lucky position that the public funds us to create knowledge. Don't waste the opportunity to do something revolutionary over petty attribution concerns. Don't rely on the threat of suing your greatest advocates (those who use your data) for recognition.

dhimmel avatar Aug 02 '16 03:08 dhimmel

here here!

On Mon, Aug 1, 2016 at 8:38 PM, Daniel Himmelstein <[email protected]

wrote:

👍 for CC0, 👎 for CC BY

I think the OBOFoundry should strongly recommend CC0 and nudge ontologies to switch from CC BY to CC0 when possible. I'll start with the legalese that reusers are subject to under a CC BY 4.0 https://creativecommons.org/licenses/by/4.0/legalcode license:

Section 3 – License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the following conditions.

a. Attribution.

If You Share the Licensed Material (including in modified form), You must:

A. retain the following if it is supplied by the Licensor with the Licensed Material:

  1. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);

    1. a copyright notice;
    2. a notice that refers to this Public License;
    3. a notice that refers to the disclaimer of warranties;
    4. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;

    B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and

    C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 2.

    You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 3.

    If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. 4.

    If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License.

The effect of Section 3 is uncertainty and drudgery. Did the the Licensor identify the creators? If so, you must retain it. Did the the Licensor supply a copyright notice? If so, you must retain it. Don't fail to mention if you modified the resource. Even if you license your derivative work under a compatible license such as CC BY-NC, you must still mention the original license. After reading these conditions, I think it's likely that my use of CC BY ontologies in Hetionet https://neo4j.het.io — an integrative network of biology — may not comply with the entirety of these CC BY conditions, even though I went to great pains https://doi.org/10.15363/thinklab.d107 trying to comply with the incredible burden laws and licenses place on publicly-funded data.

The best applications of knowledge will be integrative. Integrating CC BY content can be tricky because you must deal with multiple potentially-contradictory license conditions as well as attribution stacking. The amount of weird tricky situations that arise when you do even a little integration is astounding. Some CC BY resources will have Sui Generis database rights. Others will not. Most lawyers don't have the expertise to provide guidance on these issues and lawyers generally avoid giving advice unless contracted to do so. Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse. CC0 was designed to avoid uncertainty. The license is lengthly, but since the whole point is to make the content in the public domain, you don't have to worry about any conditions of reuse.

Legally-enforced attribution is overrated. Best practice requires establishing data provenance. Any high quality resource will attribute when that attribution is productive. Sometimes it's not productive to attribute. Sometimes it's destructive. For example, I created PharmacotherapyDB https://doi.org/10.6084/m9.figshare.3103054 — a CC0 catalog of drug–disease treatments. The drugs are coded using DrugBank and the diseases are coded using the Disease Ontology. I don't want my users to be burdened by licensing and I want my data to be maximally reused, so I used CC0. But am I violating the Disease Ontology's CC BY License? I've created a derivate work that includes 97 DO terms, and these terms potentially represent an original work of authorship. Answering this question requires wading through legal precedent, which is an extreme burden. Much of this precedent is yet to exist: the space is filled with open questions. Sometimes it's nice to just use an identifier and not have to attribute anything. Identifiers usually have their provenance embedded anyways. Based on these considerations, DrugBank — a dually licensed (aka commercial) resource — released https://thinklab.com/discussion/sounding-the-alarm-on-drugbanks-new-license-and-terms-of-use/213#10 the core of their resource as CC0.

The aforementioned practice <#m_6883376251829826149_issue-165202910> of granting WikiData permission to release data under CC0 but then officially releasing the same data under CC BY is not ideal. This will create confusion as it's unclear whether WikiData actually had sufficient permission to apply CC0. Users of WikiData content could be liable for violating upstream data licensing and many users won't want to take that risk. The authoritative source of the data should apply the most permissive license that the data is released under anywhere to avoid these situations. You also don't want two classes of users: those who access from the authoritative site and get the restrictive license and those who use WikiData. Finally, there's the possibility of a resource diverging, similar to the recent Ethereum hard fork. This could happen if WikiData is granted permission to reproduce an ontology at one point, but subsequent contributions are made under the CC BY license.

Finally, licenses and laws change over time. Currently, CC BY 4.0 is compatible with a broad range of licenses. However, incompatibilities may arise in the future. Let's create knowledge and content that withstands the test of time. From the perspective of a creator, I want to maximize the reuse of my creations. Most of us are in the incredibly lucky position that the public funds us to create knowledge. Don't waste the opportunity to do something revolutionary over petty attribution concerns. Don't rely on the threat of suing your greatest advocates (those who use your data) for recognition.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/285#issuecomment-236786429, or mute the thread https://github.com/notifications/unsubscribe-auth/AB_U6jCt5e1B80xUN3PE_l_7wvyVebfXks5qbru0gaJpZM4JK62o .

goodb avatar Aug 02 '16 07:08 goodb

:+1: to @dhimmel - especially "Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse."

cgreene avatar Aug 02 '16 11:08 cgreene

In case it or the references are useful, here is an open letter to NIGMS in support of broad adoption of CC0.

goodb avatar Aug 02 '16 19:08 goodb

Thanks @goodb for making that publicly available and linking it here. Very helpful to be able to refer to discussions like this thread, that letter, and this OpenData StackExchange thread. Collectively this has convinced us to switch to CC0 for the www.civicdb.org project.

malachig avatar Aug 03 '16 02:08 malachig

So what is the proposed solution to the attribution issue?

@andrewsu says "Most of us are in the incredibly lucky position that the public funds us to create knowledge". But the reality is a lot of the content in the OBO Library is not funded, and that which is funded is does not have secure funding. Future funding relies on the content creators justifying to funders that their ontology is widely adopted in different databases and platforms (commercial and academic). Is CC-BY a perfect tool for ensuring that companies don't take an ontology, sell it as part of their product suite and provide it to their customers with no attribution? Far from it. But many perceive this as the only tool they have. In fact the inclination is usually to go for a more restrictive license - look at the databases these ontologies are used with for examples, typically discriminatory restrictive licenses. Not everyone uses the same function to evaluate the tradeoff between perceived control and obstructive reuse. Some may prefer a sliver of protection at the cost of some obstruction to integration in some data warehouses.

How do we move forward?

  1. CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".
  2. CC-BY advocates need to provide clearer arguments for why the license should not and does not restrict good actors. The OBO documentation on how OBO prevents attribution stacking is a good start but it's not clear how that works
  3. What are some short or long term compromises? For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?
  4. Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.

cmungall avatar Aug 08 '16 23:08 cmungall

@cmungall : I don't really see how CC-BY helps one justify that the ontology is widely adopted. In practice, I expect that scientists who want to disseminate their research are going to cite the ontology regardless of its CC0/CC-BY status.

CC-BY is essentially using the threat of the legal system (which, let's be honest, is very unlikely to be enforced) to require this in some manner. Hypothetically if some commercial entity took a CC-BY resource and attempted to sell it as their own, would one imagine a university or individual using the legal system to require them to acknowledge the source? That seems like a lot of cost with relatively low reward.

I wonder if the best way to make a strong case for funding is to emphasize the impact that a resource has had. If CC-BY provides a sliver of protection but increases barriers to use in some contexts, then it may hurt ones' ability to fund a resource because the overall impact of the resource may be diminished.

cgreene avatar Aug 09 '16 13:08 cgreene

CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".

@cmungall, this issue illustrates the argument for CC0 — if an ontology wants to be part of projects like WikiData, it needs to be CC0 compatible.

For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?

I'm having trouble understanding what "axiom" means. But I think at a minimum, nodes (terms) should be released as CC0. This would include term identifiers, names, synonyms, and descriptions. This would remove any barriers to creating public domain relationships that use OBO Foundry nodes as endpoints.

Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.

CC0 will bestow a competitive advantage with respect to funding. Funders want to see their commissioned research making the greatest contribution. If given a choice between funding a CC0 and CC BY resource, I expect the funders would prefer CC0 because of the greater reuse potential. CC BY also creates the potential that the work must be repreated (say for inclusion in WikiData), which is a horrific concept to a funder.

Maximizing reuse will create the strongest argument for continued funding. Say a company does use an ontology without attribution. Grant proposals can still mention this reuse and that the ontology is creating value in industry, which will demonstrate the broad relevance and user base for the resource. At a time when the science community is beginning appreciate the importance of open data, OBO Foundry ontologies can bolster their appeal to funders by leading the way.

dhimmel avatar Aug 09 '16 14:08 dhimmel

"is there a template for providing a CC-0 axiom-subset of a CC-BY ontology". To clarify this. Many OBO ontologies now make extensive use of OWL description logic to build computable definitions of their classes. This makes it possible to, for example, infer a subclassOf or instanceOf relationship automatically based on the properties of the entity or class in question. When using terms from an ontology in many applications (any that do not use OWL) these class membership axioms may not be integrated. Hence, we can imagine that a subset of the ontology minus these more sophisticated logical constructs might be shared differently than the entire thing. Since these logical definitions contain a significant fraction of the intellectual property of the ontologies that use them, perhaps it would be more satisfactory to their authors to share the other portions of the ontologies (term names, identifiers, basic concept graphs) more completely openly. This seems to be what @dhimmel is suggesting as in fact what we have already started to do with the Gene Ontology import into wikidata..

goodb avatar Aug 12 '16 00:08 goodb

I would like to give some view from my personal site - as one of the developers of the Human Phenotype Ontology (HPO). (I do not speak for all HPO developers). Also, I am no expert on licenses. I just came across this thread upon a discussion about derivatives of HPO.

HPO's intention is to be a tool for the community and a tool created by the community. We try to keep the quality of HPO high by accepting change-requests, but still letting an HPO developer decide if this is valid request and eventually implement those changes.

HPO is now used in several contexts, in research, but also by several genetic diagnostics companies around the world that provide phenotype-driven diagnostics. For a given set of symptoms of a patient, HPO is also used to find similar patients or physicians that might be the best experts.

I vote for a more restrictive license for HPO:

  • that ensures acknowledgement, such that we know who uses HPO (funding et al.), but also that the user of HPO based tools sees that HPO used (and which version of HPO)
  • that no changes are made to HPO that are not checked by experts
  • that no derivatives are published, but if derivatives are created that those are clearly marked as not being the original HPO (I fear legal consequences)

To motivate this I want to give an excerpt of examples, that I encountered during the last years:

  • Person A of company X: started to add new ontology classes in his version of HPO using his own ID-space. We do not know which super-classes were defined for these classes.
  • Person B: in a databases of that person, the ontology was transformed from a DAG into a tree (probably for simplicity).
  • Companies XYX: do only show that HPO is used in some sub-site (in a tiny footnote).

These changes (A+B) are IMHO pretty strong, as it possibly affects the result of (semantic) similarity calculation performed over HPO. My fear is that such changes might lead to missed results or even to a slow-down in the diagnostic process. In the worst-case scenario patients have to wait longer for their diagnosis and (sorry for my pessimism) patients might die during that time.

I fear that this might fall back on HPO in terms of public opinion on the quality of HPO or even in terms of being sued and having to prove that it was the companies fault and not HPO's.

I have no idea which ready-made license is most appropriate for this, I just wanted to give a little insight on my thoughts/background.

cc @pnrobinson @mellybelly

drseb avatar Aug 12 '16 12:08 drseb

Hi everybody. I agree with Sebastian that because the HPO is being used in an ever broader range of medical contexts, extra care and responsibility is needed on our part. I think that we should basically discourage others from changing the HPO for their own needs because (i) if the change is good, we want all potential patients to benefit from it; and (ii) if the change is bad, we do not want the patients who are being served by the company in question to suffer negative consequences and we also do not want to be held legally responsible for a mistake that somebody else has made.

How does the rest of the OBO community feel about this? Is any kind of ND license acceptable in this forum owing to the status of the HPO as a resource that is being used directly in clinical care?

-peter

Peter Robinson

Professor of Computational Biology

The Jackson Laboratory for Genomic Medicine

10 Discovery Drive

Farmington, CT 06032

860.837.2095 t | 860.990.3130 m

[email protected]:[email protected]

www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures


From: Sebastian Köhler [email protected] Sent: Friday, August 12, 2016 8:09 AM To: OBOFoundry/OBOFoundry.github.io Cc: Peter Robinson; Mention Subject: Re: [OBOFoundry/OBOFoundry.github.io] Create guidelines for OBO maintainers who want to be included in Wikidata (#285)

I would like to give some view from my personal site - as one of the developers of the Human Phenotype Ontology (HPO). (I do not speak for all HPO developers). Also, I am no expert on licenses. I just came across this thread upon a discussion about derivatives of HPO.

HPO's intention is to be a tool for the community and a tool created by the community. We try to keep the quality of HPO high by accepting change-requests, but still letting an HPO developer decide if this is valid request and eventually implement those changes.

HPO is now used in several contexts, in research, but also by several genetic diagnostics companies around the world that provide phenotype-driven diagnostics. For a given set of symptoms of a patient, HPO is also used to find similar patients or physicians that might be the best experts.

I vote for a more restrictive license for HPO:

  • that ensures acknowledgement, such that we know who uses HPO (funding et al.), but also that the user of HPO based tools sees that HPO used (and which version of HPO)
  • that no changes are made to HPO that are not checked by experts
  • that no derivatives are published, but if derivatives are created that those are clearly marked as not being the original HPO (I fear legal consequences)

To motivate this I want to give an excerpt of examples, that I encountered during the last years:

  • Person A of company X: started to add new ontology classes in his version of HPO using his own ID-space. We do not know which super-classes were defined for these classes.
  • Person B: in a databases of that person, the ontology was transformed from a DAG into a tree (probably for simplicity).
  • Companies XYX: do only show that HPO is used in some sub-site (in a tiny footnote).

These changes (A+B) are IMHO pretty strong, as it possibly affects the result of (semantic) similarity calculation performed over HPO. My fear is that such changes might lead to missed results or even to a slow-down in the diagnostic process. In the worst-case scenario patients have to wait longer for their diagnosis and (sorry for my pessimism) patients might die during that time.

I fear that this might fall back on HPO in terms of public opinion on the quality of HPO or even in terms of being sued and having to prove that it was the companies fault and not HPO's.

I have no idea which ready-made license is most appropriate for this, I just wanted to give a little insight on my thoughts/background.

cc @pnrobinsonhttps://github.com/pnrobinson @mellybellyhttps://github.com/mellybelly

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHubhttps://github.com/OBOFoundry/OBOFoundry.github.io/issues/285#issuecomment-239430014, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEtuPPJOUS4YzummrDBouDY0r3rt2tgQks5qfGKKgaJpZM4JK62o.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

pnrobinson avatar Aug 12 '16 12:08 pnrobinson

I think that by arguing CC0 vs CC-by we are losing track of what we are trying to achieve. Here we have a set of resources with diverse licenses - CC0, CC-BY + few others - that would like to know how (if?) it is possible for their data to exist within Wikidata. Note that in addition to OBO resources there are many other (e.g. UniProt) which are not CC0, so I don't think this issue is isolated to the OBO community.

I like the solutions Chris offers:

  • Can Wikidata suggest a way to accommodate non CC0 resources?
  • Can OBO resources produce a CC0 subset?

Looking at the UniProt page at https://www.wikidata.org/wiki/Q905695, it states: screen shot 2016-08-12 at 14 13 15

Could we have something similar for each OBO resource?

Once we have some sort of resolution for this, we can work on the others issues that need to be addressed for including in Wikidata:

  • proper attribution, https://github.com/OBOFoundry/OBOFoundry.github.io/issues/299
  • reuse of URIs, https://github.com/OBOFoundry/OBOFoundry.github.io/issues/298

mcourtot avatar Aug 12 '16 13:08 mcourtot

I agree the thread has diverged from the original one about how to get OBOs into WD. But this is an important development. @pnrobinson and @drseb make good arguments from a license that is more restrictive than the two recommended by OBO. With my OBO hat I want to see HP adopt BY but with my HPO hat I see the arguments.

What would the implications of HPO adopting ND? As it is generally not imported and used for axiomatization the effect on the rest of OBO might be relatively low (of course implications for WD and @dhimmel's graph store are another matter).

However, if an ontology that is used for axiomatization were to adopt ND that could have very bad implications: making an import module may be in breach of the ND clause.

From a practical POV, are we looking at a two level split within OBO: 'axiomatic' ontologies and 'application' ontologies, with weaker licensing imposed on the former?

cmungall avatar Aug 12 '16 15:08 cmungall

@mcourtot - the problem with my solution is that the subset to go into WD may often constitute 99% of the content, it feels like a getout clause.

Not sure what you're suggesting re: UniProt. Are you saying that UniProt is in WD despite the more restrictive CC-ND license, hence can't we do the same thing for OBOs?

cmungall avatar Aug 12 '16 15:08 cmungall

I presume I'm not alone in lacking the weeks of time required to personally understand the legal ramifications of licensing. I must turn to others in the community (ideally lawyers making forays into what look to me like untested waters) for recommendations.

Two aspects of ontology re-use have been raised that people are turning to legalese to solve: quality control and marketing survival (funder stuff). I think they should be understood as separate challenges.

Marketing Ontology uptake is a popularity contest that has real survival ramifications. Along the lines of cgreen's "emphasize the impact that a resource has had", I'm ok with the simplicity of having a license require that its ontology be listed sufficiently with a website or data repository that uses it. Would a more precise solution that focuses on standards for reporting summaries of term usage in a system be more attractive? Something like "if a curated ontology has defined an online usage reporting service according to the semantic web W3C standard XYZ, use (views/records?) in your application of that term's URI's should be counted and reported semi-annually to that service". This is like the royalty system for playing songs on the radio - just the stats part. That sync's with any software resource provider's desire to understand its content. 'Course it would take some time to develop such a repository but at least the onus is on the ontology provider(s) to do so. Carrot and stick: Would this really need to be enforced by legalese though, or attractive enough just by existing as a service?

Quality Control Taking a historical step back, it is fascinating to compare "pre-ontology" dictionary projects like OED, in which curators avidly sought first-use instances of words and phrases in documents, but did not associate any kind of proprietary ownership beyond say acknowledging the case where a phrase was trademarked, with our digital age ontology quest, where we are witnessing the merging of term definition, formal logic, and distributed software reuse of entities. To "own" chunks of language at this level strikes me as hopefully a temporary historical concept, much like the patenting of DNA fragments. However, I get that communities want to control the definitions they curate, and thereby provide quality control, and the only way to do this is to ensure particular use of an entity can be uniquely traced back to its curating community and the conditions it legally provides use on. In that scenario - unfolding now - software may be composed of thousands of entities from hundreds of orthogonal ontologies. Will exclusive use of an CC0-licensed family of ontologies be the only way out of this complexity? Can't a simple license protect term curators from issues arising from the misuse or repurposing of the terms they curate?

P.s. I like a simple model where the use in a given database or ontology of an entity URI like http://purl.obolibrary.org/obo/GO_0097114 is sufficient for providing reference back to a term's primary ontology where any necessary further legal attribution of an ontology term and restrictions on potential re-use of its labels, definitions and other immediately associated axioms is stated.

I think the comparison to the OED project is truly apt here. You can, to this day, buy a copy of the OED - paying its maintainers and assuring that you have the most appropriate definitions according to them. The fact that its terms are CC0 does not prevent that from happening - in fact, its the only reason it does happen. Imagine the challenge of writing, well anything, if you had to negotiate for use of each semantic region of language with some different curatorial authority?!

Unless this community wants to follow in the footsteps of the Chemical Abstract Society and start suing people for use of unique identifiers for entities in the world, I see no advantages whatsoever to be gained by sticking any form of license on a PURL, a set of aliases, and a textual definition.

One of the fundamental principles of the OBO foundry is the idea of building orthogonal ontologies. They basically cannot be used without mixing them together. Consider even the case of the HPO. How useful would it be if we did not have access to the names of genes? Their coordinates on genomes? etc. Should NCBI and Ensembl start licensing the entities in their collections?

Now, what about the ontology in its full logical glory? A few thoughts:

  1. Regarding the original question about wikidata and the subsequent HetNet use case provided by Daniel Himmelstein, this is basically a moot point. Neither application can represent or compute with the OWL axioms that encode the ontology logic. Both are simple networks in need of coherent, globally unique names for their nodes and edges.
  2. The concern raised above that a group that imported an ontology might change it internally, to suit their application, and potentially generate results that are not in agreement with the intent of the ontology owners and thus potentially wrong and thus potentially dangerous is fundamentally without merit.
    a. The possibility of external changes of public information entities (e.g. all open source code) in no way breaks the curatorial authority of the owners of the resource. Unless they allowed completely unvetted changes into the ontology that they distribute from external sources, the owners continue to own. If someone wants the consortium standard view of an ontology, they can get it from that authority.
    b. It is entirely possible that people changing an ontology for use in their software might actually make it better, not worse. Presumably people building software – especially those selling software – want it to perform well at its task. They have no incentive to make it worse.
    c. If an ontology is to be released for use by the public, using whatever license you want, it is fundamentally impossible to enforce a restriction that it is used in a pre-specified, unaltered way.
  3. Ontologies do not do anything useful until they are operationalized in software. If the owners of an ontology truly believe there is only one way their creation can and should be used, then perhaps they should consider selling a binary implementation of that software rather than going down the rather confusing path of appearing on GitHub as if they are an open resource that is seeking community input.

Apart from anything else, this discussion is mostly about money, right? Developers of the OBO ontologies would, quite reasonably, like to get paid to continue their work. If everyone here in the room was driving a Tesla to work every day, free to philosophize about the nature of biological reality without worrying about their next grant, I doubt we would see such concern about a CC-by versus CC0 license for their work products. So lets answer the question, how do licenses impact our ability to keep ontology development efforts funded? Lets assume, for now, that the federal government is going to be the main source of revenue. What do they want to see for their investment? Perhaps it would be fruitful to invite some of your program officers into the discussion here but my impression is that they want to see the maximum impact for dollar invested – and that comes about through maximal openness.

goodb avatar Aug 15 '16 18:08 goodb

The name "Open Biomedical Ontologies" suggests that all OBO Foundry content should meet the Open Definition. Therefore, any "no derivative" or "non-commercial" stipulations should be out of the question. In addition, the current OBO Foundry principles specify that either a CC BY or CC0 license must be applied. Therefore, it seems that several ontologies are currently non-compliant, such as HPO, whose license states:

That neither the content of the HPO file(s) nor the logical relationships embedded within the HPO file(s) be altered in any way.

In addition to @goodb's points on the importance of derivatives, there is another very important consideration. Derivatives are necessary to decouple the content in an ontology from its initial creators. Say for example that a situation arises where the HPO is no longer effectively curating their ontology. Such a situation could occur due to a funding shortfall or faculty passing on. No derivatives means other groups cannot create parallel or successive projects. You have essentially tied the future of the knowledge to the future of the initial creators.

Regarding the liability comments by @drseb and @pnrobinson, CC0 and CC BY both contain strong liability disclaimers. CC BY goes further to require the provision of "a notice that refers to the disclaimer of warranties" and an indication "if You modified the Licensed Material". Therefore, "no derivates" achieves little-to-no extra protection from liability at great cost.

dhimmel avatar Aug 16 '16 01:08 dhimmel