dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Metadata licensing

Open philippconzett opened this issue 4 years ago • 6 comments

As a Dataverse admin, I'd like to be able to specify under what license the metadata from my repository (or even at sub-dataverse level) are made available for reuse.

There are several existing issues about data licensing, but as far I can see, no one has requested the possibility to specify machine-readable license information about the metadata. TROLLing (https://trolling.uit.no/) is currently working on an application to become a CLARIN B Centre (cf. https://www.clarin.eu/content/clarin-centres). One of the requirements for service providers is that the data and metadata of the services are licensed:

The centre has to provide a URL to a webpage where its information about licenses is described, enabling users of the repository to get information on how data and metadata are licensed.

Currently, DataverseNO (which TROLLing is a part of) provides this information in the section “Copyright and Licensing” in the DataverseNO Access and Use Policy (cf. https://site.uit.no/dataverseno/about/policy-framework/access-and-use-policy/):

DataverseNO (by owner) waives any and all rights DataverseNO might have with respect to Descriptive Metadata in DataverseNO. To the extent that DataverseNO’s own contributions to selecting and arranging Descriptive Metadata may be protected by copyright, DataverseNO (by owner) dedicates such contributions to the public domain pursuant to a CC0 Public Domain Dedication.

In addition, we'd like to have the information about metadata license being part of the machine-readable metadata provided by Dataverse.

philippconzett avatar May 03 '20 09:05 philippconzett

Hi @philippconzett. Thought I'd note that this Google Groups thread has a very similar discussion, including info about how the DDI Codebook standard supports including license information for the metadata document itself. Merce mentioned that work on DataTags would help resolve this issue, since supporting sensitive data will make it necessary to assign different types of licenses to the metadata in machine readable and more granular ways (e.g. for each dataset or even each file?).

But if the idea was to assign the same license to all of the metadata, say a public domain license, would we want each type of metadata export to say that it has a public domain license?

  • Changing the Dataverse JSON standard to include this info is probably easiest since we control it
  • I think DataCite/OpenAIRE would require an update in order to be able to distinguish between metadata license and data license
  • I'd guess that we wouldn't be able to successfully push for a change to simplified Dublin Core, and maybe it would be easier to export a simplified DC XML document for the metadata (in addition to the current one for the data). The same might be true for the extended (qualified or Terms) Dublin Core.
  • The guidelines Dataverse follows from Google for using Schema.org don't mention anything about distinguishing between metadata license and data license, but the practices around using that huge schema for data publishing are still evolving, and the need to express metadata licenses could be brought up or reiterated in the RDA working group and/or in the channels that Schema.org set up for feedback

jggautier avatar May 03 '20 19:05 jggautier

Thanks, @jggautier, for pointing to the existing discussion in Google Group. I obviously didn’t search thoroughly enough. I think, the information about metadata license should be included both at dataset level and at repository level, an in a way that complies with DataCite and OpenAIRE requirements. Then there is the question of where to define the metadata license of a repository. Maybe, re3data (https://www.re3data.org/) could be possible place? In re3data, the metadata license of a repository can be specified in the section Database licenses under the Terms tab. For DataverseNO, we have chosen CC0 (see re3data record). (The name “Database licenses” is probably not so straightforward to understand, but I got confirmed from re3data that this means the metadata license.) As re3data records are accessible via API, I guess the repository license information could easily be accessed by other services.

philippconzett avatar May 31 '20 16:05 philippconzett

Thanks for confirming the purpose of re3data's "Database licenses" field. Looks like Harvard Dataverse took it to mean the Apache license of the Dataverse software instead of the license of the dataset metadata that the repository publishes.

I'm just learning about CLARIN Centres, so I guess my questions would be:

  • Would their reviewers find that having information about a repository's license in re3data be good enough? Do they want repositories (or collections) to publish in some metadata standard information about the licenses of the datasets they publish?
  • Do they want a repository's dataset-level metadata to include the license of the dataset itself? This was the focus in that Google Group's thread. Should it be in the scope of this Github issue?

jggautier avatar Jun 01 '20 14:06 jggautier

Actually, CLARIN does currently not request metadata licence information. In the application form they say:

Note: at the moment, licensing for metadata is discussed by the legal issues committee. Awaiting the outcomes of this process, it is optional to provide a license for metadata

So, I guess our answer including a reference to our policy is just fine for the moment. However, I think it would be good to include machine-readable information about metadata licence in Dataverse, as this would imply full support for the FAIR R1.1. Principle:

(Meta)data are released with a clear and accessible data usage license

philippconzett avatar Jun 02 '20 05:06 philippconzett

Agreed! That section of the FAIR principles is what got me thinking about metadata licenses in metadata documents, too. Glad there's a GitHub issue about it. Hopefully it'll help when the community tackles these things.

jggautier avatar Jun 03 '20 11:06 jggautier

I just noticed that license support for metadata is set out as a desired characteristics in the COAR Community Framework for Good Practices in Repositories (https://doi.org/10.5281/zenodo.4110829); cf.:

1.10 The metadata in the repository are available under a Creative Commons Public Domain License [...].

philippconzett avatar Jan 22 '21 05:01 philippconzett