dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Adds MIT license file

Open jp-tosca opened this issue 1 year ago • 2 comments

What this PR does / why we need it:

The request to include on HDV the MIT License was created on https://github.com/IQSS/dataverse.harvard.edu/issues/248, This PR adds a JSON file so the license can be added.

Which issue(s) this PR closes:

Closes #10425

Suggestions on how to test this: You can add the license and check that the link is working

jp-tosca avatar Mar 25 '24 20:03 jp-tosca

I asked the community for feedback on this pull request: https://groups.google.com/g/dataverse-community/c/_UUKZT4RrmM/m/HCtul3VlAQAJ

pdurbin avatar Mar 26 '24 16:03 pdurbin

There's some discussion going on at https://dataverse.zulipchat.com/#narrow/stream/379673-dev/topic/first.20software.20license.20in.20guides.3A.20MIT/near/429826659

pdurbin avatar Mar 27 '24 11:03 pdurbin

@jp-tosca and I just pushed 5e0c73f to update the MIT license and add guidance on how to add additional licenses:

Screenshot 2024-04-11 at 12 44 09 PM

Here's a preview of the docs: https://dataverse-guide--10426.org.readthedocs.build/en/10426/installation/config.html#contributing-to-the-collection-of-standard-licenses-above

At list point we should solicit more input from the community, especially on the guidance above. The MIT license we're adding is slightly different (different URI, at least) than the one @DieuwertjeBloemen mentioned adding in https://groups.google.com/g/dataverse-community/c/_UUKZT4RrmM/m/IxQaA7ycAQAJ

Also, I'm interested in what @philippconzett thinks since he's been leading the charge on license standardization in these issues and PRs:

  • #8512
  • #9262

I'll try to dig up the right threads on the google group to have more people look (here and here). And Zulip: https://dataverse.zulipchat.com/#narrow/stream/379673-dev/topic/first.20software.20license.20in.20guides.3A.20MIT/near/429826659

For now, I guess I'll leave myself as a reviewer.

p.s. The license facet is here in 6.2! We already updated https://demo.dataverse.org and here's how it looks:

Screenshot 2024-04-11 at 12 55 58 PM

It would be wonderful to keep these values unique!

pdurbin avatar Apr 11 '24 16:04 pdurbin

Thanks all for driving standardized license information forward! I have a couple of questions:

  1. Does this PR cover #8512?
  2. What is the rationale for using the actual URL that the SPDX license link (in some/most cases) redirects to as value in the uri field? I wonder whether using the SPDX license link would spare us from monitoring link rot?
  3. Does this PR include adapting the database to include table fields where all the SPDX values (name, description, uri, ...) are stored?
  4. Should the PR include scripts for how to clean up / align licence information in legacy datasets, so that the new approach is applied to the entire Dataverse installation?

philippconzett avatar Apr 15 '24 12:04 philippconzett

@pdurbin Looks great! I think my MIT uri is the faulty one, as the url in the SPDX list has the lower-case variant.

@philippconzett

  1. I think it covers a lot if not all of #8512 though it doesn't provide a set of standard licenses straight away, but rather guidelines on how to contribute new JSONs/Licenses to ensure they are standardized. This will probably make these JSONs grow over time. (we could already do an initial push of some JSONs we have at KU Leuven that are pretty much in line with this, I'll just check them once the guidelines are approved).
  2. We decided not to use the SPDX landing page because we looked for the uri's that harvesters expected (e.g. for the creative commons, that's where you find the one OpenAire expects and DataCite (page 40) expect base URL-wise). I think the only real way to prevent link-rot would be to have a local version of each license text, but I don't think anyone wants to maintain or do that ;) We're also not guaranteed of the SPDX landing page URLs infinite availability.

DieuwertjeBloemen avatar Apr 15 '24 15:04 DieuwertjeBloemen

Thanks, @DieuwertjeBloemen. I had to revisit my issue/PR once more and now realize that the main difference between #8512 and #10426 is that #8512 is based on the DataCite recommendations, whereas #10426 is still based on the setup Dataverse uses currently, with some modifications. I think you clearly see the difference when you compare what the JSON file for CC BY 4.0 looks like in the two approaches:

JSON according to #8512: { "rightsName": "CC BY 4.0", "rightsURI": "https://creativecommons.org/licenses/by/4.0/", "rightsIdentifier": "CC-BY-4.0", "rightsIdentifierScheme": "SPDX", "schemeURI": "https://spdx.org/licenses/", "rightsShortDescription": "Creative Commons Attribution 4.0 International.", "rightsIconUrl": "https://licensebuttons.net/l/by/4.0/88x31.png", "rightsActive": true }

JSON according to #10426: { "name": "CC-BY-4.0", "uri": "http://creativecommons.org/licenses/by/4.0", "description": "CC BY 4.0", "iconUrl": "https://licensebuttons.net/l/by/4.0/88x31.png", "active": true, "sortOrder": 2 }

I guess you want to add the MIT license as soon as possible, for which #10426 seems to be a feasible way. At DataverseNO, we still would like to be able to deliver license metadata to DataCite in line with their recommendations, which will mean implementing #8512, which will take some more ressources, I guess, because fields need to be added and renamed in the database, among other things.

philippconzett avatar Apr 15 '24 16:04 philippconzett

Hi @philippconzett and @DieuwertjeBloemen thanks for your comments!

Yes, this PR (#10426) is quite small, only adding the MIT license using existing database columns/tables and adding some new documentation/guidance on adding new licenses moving forward (still using the existing columns/tables).

As for letting the SPDX link resolve or not, I'm happy to reverse the stance we've taken and declare that we should use the SPDX link as-is without redirection. Mostly I just wanted to capture the (frustrating) fact that redirection is going on and to pick one way or the other (as-is or redirected). More on this below.

Yes, I think we should leave #8512 open to think about adding additional database columns and further improving how we store licenses in the database and standardize them.

As for SQL migration scripts to handle existing licenses that are not in compliance with the guidance we've written up (CC0 and friends) @jp-tosca and talked about out but decided this work is out of scope for this issue. You may have noticed that we added this note to the guidance: Note that prior to Dataverse 6.2, various license above have been added that do not adhere perfectly with this procedure. For example, the name for the CC0 license is CC0 1.0 (no dash) rather than CC0-1.0 (with a dash). We are keeping the existing names for backward compatibility. For more on standarizing license configuration, see https://github.com/IQSS/dataverse/issues/8512. Basically, we talked briefly about how it would be a fair amount of work to write these scripts so we'd rather defer this until #8512.

As for providing additional licenses, yes, sure, we're open to more. After this PR (#10426) gets finalized and merged, @DieuwertjeBloemen you're welcome to add more. Thanks!

So! In the interest of keeping things moving, it sounds like we're all more or less in agreement of the scope of this PR (#10426) as well as its content, with the possible exception of this line...

- For the ``uri`` field, go to the SPDX landing page for the license and click on the link under "other web pages for this license". Let any redirection happen and then copy the URL (e.g. ``https://opensource.org/license/mit``) into the ``uri`` field.

If we change the language to use the exact URL as shown on the SPDX landing page (rather than letting redirection happen), we would change...

"uri": "https://opensource.org/license/mit",

to

"uri": "https://opensource.org/license/mit/",

Again, I don't feel strongly about this. Does anyone?

pdurbin avatar Apr 16 '24 17:04 pdurbin

CC-BY-4.0 is an interesting one: https://spdx.org/licenses/CC-BY-4.0.html points you to https://creativecommons.org/licenses/by/4.0/legalcode where the CC folks tell you the canonical URL is https://creativecommons.org/licenses/by/4.0/. There are no redirects.

In general, I'd think we'd want the canonical URL as defined by the license provider (versus wherever spdx points if that's different) but I agree the question of a trailing slash is painfully trivial (especially when CC and MIT appear to choose opposite conventions!). I wouldn't be surprised if people trying to parse these can handle that much difference, but who knows.

qqmyers avatar Apr 16 '24 18:04 qqmyers

Yesterday I was also wondering if maybe an example (one of the CC options perhaps) should be chosen in addition to the MIT license for the documentation that examplifies if the "name" field in the JSON is with or without the dash, because MIT is not an example that makes this explicit. But that's just a minor detail that I thought could perhaps be improved in the above-mentioned documentation.

DieuwertjeBloemen avatar Apr 17 '24 07:04 DieuwertjeBloemen

@DieuwertjeBloemen good idea.

@jp-tosca in the docs, can you please switch from MIT to another license as the example? Please feel free to add an additional license in the process, one that exercises the rules a little more thoroughly.

@DieuwertjeBloemen as we wrote in the guidance above, we are considering the existing CC0 licenses grandfathered in: Note that prior to Dataverse 6.2, various license above have been added that do not adhere perfectly with this procedure. For example, the name for the CC0 license is CC0 1.0 (no dash) rather than CC0-1.0 (with a dash). We are keeping the existing names for backward compatibility. What do you think?

pdurbin avatar Apr 17 '24 13:04 pdurbin

@pdurbin It think it makes sense to 'grandfather' it for now. If someone at some points wants to make it compatible with the rest and figure out how to update it on existing datasets, then that can always be done at a later stage.

DieuwertjeBloemen avatar Apr 17 '24 15:04 DieuwertjeBloemen

@DieuwertjeBloemen yeah, updating the CC license to comply with the new thinking will require an SQL migration script (in Flyway). I'm glad you're ok with this being out of scope for this PR.

pdurbin avatar Apr 17 '24 15:04 pdurbin