license-list-XML icon indicating copy to clipboard operation
license-list-XML copied to clipboard

New license request: BigScience Open RAIL-M License

Open ofek opened this issue 3 years ago • 18 comments

  1. License Name: BigScience Open RAIL-M License
  2. Short identifier: BigScience-OpenRAIL-M
  3. License Author or steward: Hugging Face
  4. Comments: This license does not match any currently on the list. It is the first Responsible AI License designed for broad usage.
  5. Standard License Header:
  6. License Request Url: Unknown
  7. URL(s): https://drive.google.com/file/d/16NqKiAkzyZ55NClubCIFup8pT2jnyVIo/view
  8. OSI Status: Unknown
  9. Example Projects: https://huggingface.co/bigscience/bloom

ofek avatar Sep 05 '22 03:09 ofek

thanks for the submission, @ofek - we are in the midst of focusing on some documentation updates for the upcoming release, so I've marked this for the subsequent release.

In the meantime - this license looks a bit familiar, but it's dated quite recently - is this an iteration on a previous version? Am I correct that it has not seen much use yet (being so new)?

jlovejoy avatar Sep 05 '22 19:09 jlovejoy

Thank you!

  1. is this an iteration on a previous version?

    Yes from BigScience RAIL License v1.0, they say:

    In general terms, the license is almost the same, no critical modifications have been made. We adapted it to be applicable to other models, such as other NLP models or multimodal generative ones, and not just to the BLOOM set of models. We introduced the terms “information and/or content” along the definitions and use-based restrictions in Attachment A to emphasize the cases where the license may even be applied to multimodal generative models.

  2. Am I correct that it has not seen much use yet (being so new)?

    Yes, though I anticipate rapid adoption in the coming months. cc @CarlosMFerr

ofek avatar Sep 05 '22 20:09 ofek

@jlovejoy we ( @Pizza-Ria and I ) are reading through this license text a quick scan shows 47% similarity Educational Community License, Version 2.0 - which is on the OSI list

KIRWOG avatar Sep 07 '22 05:09 KIRWOG

@swinslow @jlovejoy if you want to assign this one to @KIRWOG and myself that would be fine!

Pizza-Ria avatar Sep 08 '22 21:09 Pizza-Ria

Excellent, thanks @Pizza-Ria and @KIRWOG!

swinslow avatar Sep 09 '22 13:09 swinslow

For what it's worth as you're looking at this: I note that stable-diffusion, a project which has gotten a lot of recent attention, appears to use a similar but not-identical license.

From a 30,000 foot view the preambles are different, but the major structure of the licenses looks at least similar. I haven't compared the terms or done anything more to investigate it yet, but just flagging as another instance that you might want to look closely at as you're digging into this one.

swinslow avatar Sep 09 '22 13:09 swinslow

I note that this license clearly fails to meet the first plank of the "Other factors" (the non-Definitive factors) in the SPDX License Inclusion Principles ("The license substantially complies with one of the following open source definitions (even if not submitted for approval or these organization have not considered the license)". (edited this to clarify that that is not the first plank of the principles)

richardfontana avatar Sep 16 '22 03:09 richardfontana

It would be a shame if SPDX couldn't/didn't support what people use for responsible AI/ML releases.

ofek avatar Sep 17 '22 13:09 ofek

Hi @richardfontana This is not an open source license as per the OSD but there is interest in adoption of these types by some in AI community. Could SPDX not consider the inclusion of Open-RAIL-M Licenses in general and in this particular instance the BigScience OpenRAIL-M License?

Here's a bit more on Open RAIL-M Licenses: https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses

I do see non-open source licenses on SPDX: Eg: https://spdx.org/licenses/Hippocratic-2.1.html

danishcontractor avatar Sep 19 '22 22:09 danishcontractor

@danishcontractor you are correct, SPDX has many non-FOSS licenses on its license list, but my point is just that not every arbitrary license qualifies for inclusion based on the SPDX project's own license inclusion principles. I personally supported inclusion of Hippocratic-2.1 (if I recall correctly). One way in which I misspoke though: the "first plank" I spoke of is actually the first plank of the set of "Other factors". It is possible the Big Science OpenRAIL-M license satisfies all of the "Definitive factors" (apart of course from the fact that it is not OSI-approved). I do have a bias in that I would like to see the SPDX license list consist of mostly FOSS licenses. I am also generally concerned about the SPDX license list being used for political, promotional or marketing purposes.

richardfontana avatar Sep 20 '22 01:09 richardfontana

Any update on this?

ofek avatar Oct 04 '22 18:10 ofek

Hi all! Thanks for considering supporting the "BigScience OpenRAIL-M" with SPDX. I'm one of the drafters of the license.

To the aforementioned discussion, several considerations that could facilitate your decision-making process:

  1. The license satisfies the majority of the SPDX factors (both "definitive" and "others"), except for not being aligned with "open source" definitions;
  2. OpenRAIL-M challenges de status quo as it is a license specifically designed for machine learning models (i.e. - not code, but weights/parameters, these are different artifacts despite being technically interdependent when the model is embedded in software - e.g. ML app).
  3. OpenRAILs are a new licensing paradigm in AI spaces, already used by major projects such as BLOOM and Stable Diffusion (CreativeML OpenRAIL-M)
  4. Thus, we are responding to a new phenomenon happening right now: the need of an open sharing and release of ML artifacts with a set of use restrictions stemming from the acknowledgment of the technical capabilities/limitations of the ML model (i.e. concerns from the licensor on how the ML model could be misused)
  5. IMO, for SPDX it is not just about whether to include a new license, but whether to be willing to adapt to new licensing trends informed by new technological phenomena (something, btw, which is currently under discussion even in OSI - e.g. AI Deep Dive project).

In any case, I already thank you for taking the time of considering OpenRAILs and having this discussion, much appreciated!

CarlosMFerr avatar Oct 05 '22 07:10 CarlosMFerr

@CarlosMFerr Since you mentioned you are one the authors, Could you confirm if you are the Steward? And if you are then would you commit to version control of future version, if any?

KIRWOG avatar Oct 06 '22 15:10 KIRWOG

@KIRWOG I am one of the stewards, but not the only one, I tag @danishcontractor as he drafted it with me.

From my side, YES, I would commit to version control if by it you understand being in charge of triggering the process of and implementing + announcing future subsequent versions of the license (and, controlling subsequent versions which do not stem from BigScience community and its stewards - myself and Danish).

In any case, I also want to have @danishcontractor approval on this (he will be available from next week on).

Thank you for your consideration:)

CarlosMFerr avatar Oct 06 '22 16:10 CarlosMFerr

Hi there, just to reiterate what I had said previously - we are focusing on some documentation related projects this release, so this is marked for review for the next release, hence no review yet.

As for the factors, the actual use factor seems significant here. It is not about whether SPDX is "willing to new licensing trends" in a proactive way - that is not the project's mission. We are trying to capture what license people actually find "in the wild" - notably when exchanging software.

jlovejoy avatar Oct 07 '22 15:10 jlovejoy

Will be working on it this week! Thanks!

On Tue, Oct 4, 2022 at 11:23 AM Ofek Lev @.***> wrote:

Any update on this?

— Reply to this email directly, view it on GitHub https://github.com/spdx/license-list-XML/issues/1622#issuecomment-1267388366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALFBWDESKH6DTWQDY35WOEDWBRY3PANCNFSM6AAAAAAQETPHCI . You are receiving this because you were mentioned.Message ID: @.***>

Pizza-Ria avatar Oct 11 '22 09:10 Pizza-Ria

@CarlosMFerr & @KIRWOG I believe we are still waiting for feedback from @danishcontractor per above?

Pizza-Ria avatar Oct 13 '22 14:10 Pizza-Ria

Thanks @CarlosMFerr @Pizza-Ria. yes I agree with what Carlos stated and being one of the stewards.

danishcontractor avatar Oct 14 '22 19:10 danishcontractor

New submission review

  • Has the license been approved by the OSI? All OSI-approved license will be included on the SPDX License List, regardless of other factors
    • [ ] Yes
    • [x] No

Definitive Factors

These must all be satisfied to allow inclusion in the license list

  1. Is the submitted license unique, that is, it does not match another license already on the License List as per the matching guidelines?
    • [x] Yes
    • [ ] No
  2. If a software license, does it apply to source code and not only to executables?
    • [ ] Yes
    • [x] No
  3. Does the license have identifiable and stable text, and is not in the midst of drafting?
    • [x] Yes
    • [ ] No
  4. Has the license steward, if any, committed to versioning new versions and to not modify it after addition to the list?
    • [x] Yes
    • [ ] No

Other factors for inclusion

Roughly in order of descending importance

  1. Does the license substantially comply with one of the free/open content definitions? (examples include the Open Source Definition and the Debian Free Software Guidelines) (Approval by the organisation that publishes the definition is not required)
    • [ ] Yes
    • [x] No
  2. Is the license structured to be generally usable by anyone, and not specific to one organisation or project?
    • [x] Yes
    • [ ] No
  3. Does the license have substantial use such that it is likely to be encountered (ie. use in many projects, or in one significant project)? (For recently written licenses, definitive plans for it to be used in at least one or a few significant projects may satisfy this)
    • [x] Yes
    • [ ] No
  4. Is the license primarily intended to facilitate the free distribution of content with limited restrictions?
    • [ ] Yes
    • [x] No
  5. Does the license steward support this submission, or is at least aware of and not in opposition of it?
    • [x] Yes
    • [ ] No

Summary of factors, outcome, comments

  • The closest match to this license seems to be Educational Community License, Version 2.0 with a 47% match.
  • This license relates to a Model and the Complementary material that defines the Model. The license does not seem to indicate that the right to modify applies to the Model and the Complementary material.
  • Ethical licensing. The nature of restrictions are intended to prevent unlawful activities and activities detrimental to public interest. And those related to Privacy concerns. This is a cause of concern for difficulty in interpretation and enforcement in various jurisdictions.
  • According to the license Stewards, OpenRAIL licenses are widely used by the ML community.They are trying to keep up and are in line with the evolving regulatory framework. Note: This is as claimed by the license stewards on their blog https://www.licenses.ai/blog/2022/8/26/bigscience-open-rail-m-license
  • As per @richardfontana’s comment, this does not comply with any of the free/open content definitions.
  • This relates to a Model and the Complementary material that defines the Model. The license does not seem to indicate that the right to modify applies to the Model and the Complementary material.

Comments on the totality of these "other factors" in light of the SPDX License List's overall goals and objectives

KIRWOG avatar Oct 20 '22 08:10 KIRWOG

If a software license, does it apply to source code and not only to executables?

A model is closer to source code than an executable, imo

ofek avatar Oct 20 '22 14:10 ofek

@KIRWOG @ofek the license applies to both model and source code (complementary material) but the restrictions on use only apply to the model.

danishcontractor avatar Oct 20 '22 14:10 danishcontractor

@KIRWOG what is the conclusion of "yes" for "Does the license have substantial use such that it is likely to be encountered (ie. use in many projects, or in one significant project)?" based on? I don't have any knowledge of this one way or the other but it seems to me SPDX should do some independent investigation to ascertain whether this principle is met. If I understood one of the previous comments correctly, this particular license isn't being used yet.

richardfontana avatar Oct 22 '22 01:10 richardfontana

@ofek You cite https://huggingface.co/bigscience/bloom as an "example project", but that link says that the license used there is "bigscience-bloom-rail-1.0" which I assume refers to a different RAIL license. I looked for some examples of stuff on Hugging Face using "bigscience-openrail-m" via https://huggingface.co/models?license=license:bigscience-openrail-m&sort=downloads -- much of that seems to be spam or empty repositories.

As far as I can tell, the Hugging Face site does not provide reliable information on the licensing of anything. There doesn't seem to be any notion of a 'license file' as is conventional in open source software repositories for example.

richardfontana avatar Oct 25 '22 17:10 richardfontana

First of all, there are 2 generally applicable licenses: CreativeML OpenRAIL-M (see license here: https://huggingface.co/spaces/CompVis/stable-diffusion-license ; see models in the hub here: https://huggingface.co/models?license=license:creativeml-openrail-m&sort=downloads ) and BigScience OpenRAIL-M (see models in the hub here: https://huggingface.co/models?license=license:bigscience-openrail-m&sort=downloads).

Yes, some users' files just include the model without any reference to the license (as for a lot of OSS projects also, I believe, this is not something specific to RAILs, right?).

Also, we do specify what an OpenRAIL is. If you click on a model with an OpenRAIL license and you click on the license tag you'll have a summary of the license and a link for more reliable information.

CarlosMFerr avatar Oct 26 '22 09:10 CarlosMFerr

@CarlosMFerr you didn't respond to my assertion that most of the "uses" of BigScience OpenRAIL-M on Hugging Face are spam or empty repositories. From what I can see, the only one in https://huggingface.co/models?license=license:bigscience-openrail-m&sort=downloads that possibly isn't is something that is inaccessible without a login.

The SPDX inclusion guidelines include:

The license has actual, substantial use such that it is likely to be encountered. Substantial use may be demonstrated via use in many projects, or in one or a few significant projects. For new licenses, there are definitive plans for the license to be used in one or a few significant projects.

If the license does not substantively comply with one of the above open source definitions, then the license is primarily intended for free distribution of content (including, in the case of software, its source code) with limited restrictions, and meets other factors listed here.

The question I am raising is whether these are met. I am not seeing how you have demonstrated that there is "actual, substantial use" and I don't think you have provided evidence of "definitive plans" for the license to be used by a significant project. Separately, I am not sure I see how the license is intended for free distribution of content given that I don't see any example of content that is being "freely distributed" in the sense I think is meant in the guidelines (accessible on the public web without being behind a login wall, essentially).

richardfontana avatar Oct 26 '22 14:10 richardfontana

@KIRWOG what is the conclusion of "yes" for "Does the license have substantial use such that it is likely to be encountered (ie. use in many projects, or in one significant project)?" based on? I don't have any knowledge of this one way or the other but it seems to me SPDX should do some independent investigation to ascertain whether this principle is met. If I understood one of the previous comments correctly, this particular license isn't being used yet.

@richardfontana - We gave the benefit of the doubt on this factor based on the submitter's assertion but appreciate your additional digging into the actual uses on HuggingFace - good lesson for tackling submissions in the future.

Pizza-Ria avatar Oct 26 '22 17:10 Pizza-Ria

tldr: there are real users, but none of them are using the same text, so this is premature.

  1. There are several users: stable-diffusion (already noted) is a very significant one with a real, vibrant community; BLOOM is invested in by the French government and state of the art for some non-English languages; and the upcoming BigCode release will be under RAIL-M. So I think the usage is real and not promotional; Richard's concern is valid generally but not applicable here.

  2. But none of these uses are under identical terms. Carlos has already mentioned CreativeML v. BigScience; the BigCode license will also be different. Since there is no "one" RAIL-M license, this fails the "identifiable and stable text" plank of the definitive factors. I am talking with Carlos and Danish about maturing the licenses so that this will be possible in the future, but it's not there yet.

tieguy avatar Nov 14 '22 21:11 tieguy

thanks @tieguy - so, do I take it we should close this and it can re-opened or a new issue opened when the text is stable?

jlovejoy avatar Dec 06 '22 06:12 jlovejoy

I'm inclined to close this one for now.

Pizza-Ria avatar Dec 07 '22 01:12 Pizza-Ria

Can someone enumerate what needs to be done?

ofek avatar Dec 07 '22 01:12 ofek