dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Multiple license update turned the terms of some datasets with only CC0 licenses into "custom licenses"

Open jggautier opened this issue 2 years ago • 6 comments

What steps does it take to reproduce the issue?

  • When does this issue occur? After a Dataverse installation that's running a Dataverse software version before 5.10 is updated to version 5.10 or later.

  • Which page(s) does it occurs on?

    The dataset page on the Terms tab and in the metadata exports that include license and terms metadata

  • What happens?

    The update from pre-v5.10 to 5.10+ adjusts the terms metadata of datasets in the installation. The datasets that have a CC0 waiver plus text in any other field in the old "Terms of Use" accordion are considered to have "custom licenses".

    But some datasets with CC0 waivers and no text in fields in the old "Terms of Use" accordions also appear, after the update, to have "custom licenses", such as https://doi.org/10.7910/DVN/00ROYZ and https://doi.org/10.15787/h6vv-ts37. The only text in the custom license is "CC0 waiver":

    Screen Shot 2022-10-18 at 1 53 03 PM

To whom does it occur (all users, curators, superusers)?

Depositors, curators, anyone responsible for or looking at datasets whose terms metadata has changed after the Dataverse installation's software version was updated

What did you expect to happen?

The dataset would have a CC0 waiver instead of a "custom license"

Which version of Dataverse are you using? v5.10 or later

Any related open or closed issues to this bug report?

jggautier avatar Oct 18 '22 18:10 jggautier

This would probably be better in the Harvard github as it probably involves removing Terms Of Use with that specific text from those Datasets. (I'm assuming that the software worked as designed and converted Datasets with the CC0 Waiver on and the text 'CC0 Waiver' repeated in the termsofuse field to the form above which would be as designed/not a bug. The release notes had the queries to find/fix these corner cases where the automatic conversion wouldn't make sense.)

qqmyers avatar Oct 18 '22 19:10 qqmyers

Thanks. Yeah I was going to open this issue in the Harvard repo but found datasets like this in several Dataverse installations. One of the two examples, https://doi.org/10.15787/h6vv-ts37, is not from Harvard's repo.

It looks like there are cases where one or more previous versions of the Dataverse software added the string "CC0 Waiver" to the Terms of Use field when CC0 was used.

Are you saying there are steps in the upgrade instructions for what installations can do for these datasets? And so it's likely that the installations with datasets like these didn't follow those steps?

jggautier avatar Oct 18 '22 19:10 jggautier

~yes - the instructions had queries to find things that would be converted and, if you didn't want that default, instructions to make edits beforehand to avoid it. This particular case wasn't identified, just anything where what was in terms of use or other fields now hidden with a standard license didn't justify changing to a custom license, you could find them and decide whether to remove text, move things to other fields, etc. Then and after the update, one could change draft or as a superuser, the last published version via the UI/API ('update current version' to affect the last published version. To change earlier versions, the only option is to do a db update.

qqmyers avatar Oct 18 '22 19:10 qqmyers

To add to this, in our instance we setup templates for everyone to use with the CC licenses added (to make it easier to select and use appropriate licenses upon deposit). These datasets that used the license templates before this standard license set was introduced in DV are not mapping to the new standard licenses (even they are conceptually the same CC licenses). We are currently investigating a way to update the previous licenses with the new standard license, where they are the same, so that they do not appear with "custom licenses" in the terms metadata.

Julian's thoughts on possible update workflows: Once an instance has the "standard" licenses" maybe the process would be:

  • Find each dataset that needs to be updated to use a standard license. This could be a database query or using the Dataverse APIs, though using the APIs would take longer I think
  • Use the Dataverse APIs (or database edits) to create a new version of each dataset that uses one of the repository's "standard" licenses instead of what's in the Terms of Use field
  • Use the Dataverse APIs to either publish a new version of each dataset or that overrides the current published version (I hear that collection managers, curators, depositors sometimes don't want to publish new versions)

We would also like to investigate adding our own license templates terms to the add-license JSON, or otherwise investigate ways to do this without updating metadata and versions. Stay tuned

amberleahey avatar Dec 12 '22 16:12 amberleahey

Needs sizing, and prioritizing

sbarbosadataverse avatar Apr 24 '24 20:04 sbarbosadataverse

20204/05/08

  • One option is to update all affected datasets with a script
  • Document script in the guide
  • Apply in dataverse.harvard.edu
  • Also investigate possibility of creating an API to perform tasks like these

cmbz avatar May 08 '24 19:05 cmbz