New license request: BigScience Open RAIL-M License
- License Name: BigScience Open RAIL-M License
- Short identifier: BigScience-OpenRAIL-M
- License Author or steward: Hugging Face
- Comments: This license does not match any currently on the list. It is the first Responsible AI License designed for broad usage.
- Standard License Header:
- License Request Url: Unknown
- URL(s): https://drive.google.com/file/d/16NqKiAkzyZ55NClubCIFup8pT2jnyVIo/view
- OSI Status: Unknown
- Example Projects: https://huggingface.co/bigscience/bloom
thanks for the submission, @ofek - we are in the midst of focusing on some documentation updates for the upcoming release, so I've marked this for the subsequent release.
In the meantime - this license looks a bit familiar, but it's dated quite recently - is this an iteration on a previous version? Am I correct that it has not seen much use yet (being so new)?
Thank you!
-
is this an iteration on a previous version?
Yes from BigScience RAIL License v1.0, they say:
In general terms, the license is almost the same, no critical modifications have been made. We adapted it to be applicable to other models, such as other NLP models or multimodal generative ones, and not just to the BLOOM set of models. We introduced the terms “information and/or content” along the definitions and use-based restrictions in Attachment A to emphasize the cases where the license may even be applied to multimodal generative models.
-
Am I correct that it has not seen much use yet (being so new)?
Yes, though I anticipate rapid adoption in the coming months. cc @CarlosMFerr
@jlovejoy we ( @Pizza-Ria and I ) are reading through this license text a quick scan shows 47% similarity Educational Community License, Version 2.0 - which is on the OSI list
@swinslow @jlovejoy if you want to assign this one to @KIRWOG and myself that would be fine!
Excellent, thanks @Pizza-Ria and @KIRWOG!
For what it's worth as you're looking at this: I note that stable-diffusion, a project which has gotten a lot of recent attention, appears to use a similar but not-identical license.
From a 30,000 foot view the preambles are different, but the major structure of the licenses looks at least similar. I haven't compared the terms or done anything more to investigate it yet, but just flagging as another instance that you might want to look closely at as you're digging into this one.
I note that this license clearly fails to meet the first plank of the "Other factors" (the non-Definitive factors) in the SPDX License Inclusion Principles ("The license substantially complies with one of the following open source definitions (even if not submitted for approval or these organization have not considered the license)". (edited this to clarify that that is not the first plank of the principles)
It would be a shame if SPDX couldn't/didn't support what people use for responsible AI/ML releases.
Hi @richardfontana This is not an open source license as per the OSD but there is interest in adoption of these types by some in AI community. Could SPDX not consider the inclusion of Open-RAIL-M Licenses in general and in this particular instance the BigScience OpenRAIL-M License?
Here's a bit more on Open RAIL-M Licenses: https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses
I do see non-open source licenses on SPDX: Eg: https://spdx.org/licenses/Hippocratic-2.1.html
@danishcontractor you are correct, SPDX has many non-FOSS licenses on its license list, but my point is just that not every arbitrary license qualifies for inclusion based on the SPDX project's own license inclusion principles. I personally supported inclusion of Hippocratic-2.1 (if I recall correctly). One way in which I misspoke though: the "first plank" I spoke of is actually the first plank of the set of "Other factors". It is possible the Big Science OpenRAIL-M license satisfies all of the "Definitive factors" (apart of course from the fact that it is not OSI-approved). I do have a bias in that I would like to see the SPDX license list consist of mostly FOSS licenses. I am also generally concerned about the SPDX license list being used for political, promotional or marketing purposes.
Any update on this?
Hi all! Thanks for considering supporting the "BigScience OpenRAIL-M" with SPDX. I'm one of the drafters of the license.
To the aforementioned discussion, several considerations that could facilitate your decision-making process:
- The license satisfies the majority of the SPDX factors (both "definitive" and "others"), except for not being aligned with "open source" definitions;
- OpenRAIL-M challenges de status quo as it is a license specifically designed for machine learning models (i.e. - not code, but weights/parameters, these are different artifacts despite being technically interdependent when the model is embedded in software - e.g. ML app).
- OpenRAILs are a new licensing paradigm in AI spaces, already used by major projects such as BLOOM and Stable Diffusion (CreativeML OpenRAIL-M)
- Thus, we are responding to a new phenomenon happening right now: the need of an open sharing and release of ML artifacts with a set of use restrictions stemming from the acknowledgment of the technical capabilities/limitations of the ML model (i.e. concerns from the licensor on how the ML model could be misused)
- IMO, for SPDX it is not just about whether to include a new license, but whether to be willing to adapt to new licensing trends informed by new technological phenomena (something, btw, which is currently under discussion even in OSI - e.g. AI Deep Dive project).
In any case, I already thank you for taking the time of considering OpenRAILs and having this discussion, much appreciated!
@CarlosMFerr Since you mentioned you are one the authors, Could you confirm if you are the Steward? And if you are then would you commit to version control of future version, if any?
@KIRWOG I am one of the stewards, but not the only one, I tag @danishcontractor as he drafted it with me.
From my side, YES, I would commit to version control if by it you understand being in charge of triggering the process of and implementing + announcing future subsequent versions of the license (and, controlling subsequent versions which do not stem from BigScience community and its stewards - myself and Danish).
In any case, I also want to have @danishcontractor approval on this (he will be available from next week on).
Thank you for your consideration:)
Hi there, just to reiterate what I had said previously - we are focusing on some documentation related projects this release, so this is marked for review for the next release, hence no review yet.
As for the factors, the actual use factor seems significant here. It is not about whether SPDX is "willing to new licensing trends" in a proactive way - that is not the project's mission. We are trying to capture what license people actually find "in the wild" - notably when exchanging software.
Will be working on it this week! Thanks!
On Tue, Oct 4, 2022 at 11:23 AM Ofek Lev @.***> wrote:
Any update on this?
— Reply to this email directly, view it on GitHub https://github.com/spdx/license-list-XML/issues/1622#issuecomment-1267388366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALFBWDESKH6DTWQDY35WOEDWBRY3PANCNFSM6AAAAAAQETPHCI . You are receiving this because you were mentioned.Message ID: @.***>
@CarlosMFerr & @KIRWOG I believe we are still waiting for feedback from @danishcontractor per above?
Thanks @CarlosMFerr @Pizza-Ria. yes I agree with what Carlos stated and being one of the stewards.
New submission review
-
Has the license been approved by the OSI? All OSI-approved license will be included on the SPDX License List, regardless of other factors
- [ ] Yes
- [x] No
Definitive Factors
These must all be satisfied to allow inclusion in the license list
-
Is the submitted license unique, that is, it does not match another license already on the License List as per the matching guidelines?
- [x] Yes
- [ ] No
-
If a software license, does it apply to source code and not only to executables?
- [ ] Yes
- [x] No
-
Does the license have identifiable and stable text, and is not in the midst of drafting?
- [x] Yes
- [ ] No
-
Has the license steward, if any, committed to versioning new versions and to not modify it after addition to the list?
- [x] Yes
- [ ] No
Other factors for inclusion
Roughly in order of descending importance
-
Does the license substantially comply with one of the free/open content definitions? (examples include the Open Source Definition and the Debian Free Software Guidelines) (Approval by the organisation that publishes the definition is not required)
- [ ] Yes
- [x] No
-
Is the license structured to be generally usable by anyone, and not specific to one organisation or project?
- [x] Yes
- [ ] No
-
Does the license have substantial use such that it is likely to be encountered (ie. use in many projects, or in one significant project)? (For recently written licenses, definitive plans for it to be used in at least one or a few significant projects may satisfy this)
- [x] Yes
- [ ] No
-
Is the license primarily intended to facilitate the free distribution of content with limited restrictions?
- [ ] Yes
- [x] No
-
Does the license steward support this submission, or is at least aware of and not in opposition of it?
- [x] Yes
- [ ] No
Summary of factors, outcome, comments
- The closest match to this license seems to be Educational Community License, Version 2.0 with a 47% match.
- This license relates to a Model and the Complementary material that defines the Model. The license does not seem to indicate that the right to modify applies to the Model and the Complementary material.
- Ethical licensing. The nature of restrictions are intended to prevent unlawful activities and activities detrimental to public interest. And those related to Privacy concerns. This is a cause of concern for difficulty in interpretation and enforcement in various jurisdictions.
- According to the license Stewards, OpenRAIL licenses are widely used by the ML community.They are trying to keep up and are in line with the evolving regulatory framework. Note: This is as claimed by the license stewards on their blog https://www.licenses.ai/blog/2022/8/26/bigscience-open-rail-m-license
- As per @richardfontana’s comment, this does not comply with any of the free/open content definitions.
- This relates to a Model and the Complementary material that defines the Model. The license does not seem to indicate that the right to modify applies to the Model and the Complementary material.
Comments on the totality of these "other factors" in light of the SPDX License List's overall goals and objectives
If a software license, does it apply to source code and not only to executables?
A model is closer to source code than an executable, imo
@KIRWOG @ofek the license applies to both model and source code (complementary material) but the restrictions on use only apply to the model.
@KIRWOG what is the conclusion of "yes" for "Does the license have substantial use such that it is likely to be encountered (ie. use in many projects, or in one significant project)?" based on? I don't have any knowledge of this one way or the other but it seems to me SPDX should do some independent investigation to ascertain whether this principle is met. If I understood one of the previous comments correctly, this particular license isn't being used yet.
@ofek You cite https://huggingface.co/bigscience/bloom as an "example project", but that link says that the license used there is "bigscience-bloom-rail-1.0" which I assume refers to a different RAIL license. I looked for some examples of stuff on Hugging Face using "bigscience-openrail-m" via https://huggingface.co/models?license=license:bigscience-openrail-m&sort=downloads -- much of that seems to be spam or empty repositories.
As far as I can tell, the Hugging Face site does not provide reliable information on the licensing of anything. There doesn't seem to be any notion of a 'license file' as is conventional in open source software repositories for example.
First of all, there are 2 generally applicable licenses: CreativeML OpenRAIL-M (see license here: https://huggingface.co/spaces/CompVis/stable-diffusion-license ; see models in the hub here: https://huggingface.co/models?license=license:creativeml-openrail-m&sort=downloads ) and BigScience OpenRAIL-M (see models in the hub here: https://huggingface.co/models?license=license:bigscience-openrail-m&sort=downloads).
Yes, some users' files just include the model without any reference to the license (as for a lot of OSS projects also, I believe, this is not something specific to RAILs, right?).
Also, we do specify what an OpenRAIL is. If you click on a model with an OpenRAIL license and you click on the license tag you'll have a summary of the license and a link for more reliable information.
@CarlosMFerr you didn't respond to my assertion that most of the "uses" of BigScience OpenRAIL-M on Hugging Face are spam or empty repositories. From what I can see, the only one in https://huggingface.co/models?license=license:bigscience-openrail-m&sort=downloads that possibly isn't is something that is inaccessible without a login.
The SPDX inclusion guidelines include:
The license has actual, substantial use such that it is likely to be encountered. Substantial use may be demonstrated via use in many projects, or in one or a few significant projects. For new licenses, there are definitive plans for the license to be used in one or a few significant projects.
If the license does not substantively comply with one of the above open source definitions, then the license is primarily intended for free distribution of content (including, in the case of software, its source code) with limited restrictions, and meets other factors listed here.
The question I am raising is whether these are met. I am not seeing how you have demonstrated that there is "actual, substantial use" and I don't think you have provided evidence of "definitive plans" for the license to be used by a significant project. Separately, I am not sure I see how the license is intended for free distribution of content given that I don't see any example of content that is being "freely distributed" in the sense I think is meant in the guidelines (accessible on the public web without being behind a login wall, essentially).
@KIRWOG what is the conclusion of "yes" for "Does the license have substantial use such that it is likely to be encountered (ie. use in many projects, or in one significant project)?" based on? I don't have any knowledge of this one way or the other but it seems to me SPDX should do some independent investigation to ascertain whether this principle is met. If I understood one of the previous comments correctly, this particular license isn't being used yet.
@richardfontana - We gave the benefit of the doubt on this factor based on the submitter's assertion but appreciate your additional digging into the actual uses on HuggingFace - good lesson for tackling submissions in the future.
tldr: there are real users, but none of them are using the same text, so this is premature.
-
There are several users: stable-diffusion (already noted) is a very significant one with a real, vibrant community; BLOOM is invested in by the French government and state of the art for some non-English languages; and the upcoming BigCode release will be under RAIL-M. So I think the usage is real and not promotional; Richard's concern is valid generally but not applicable here.
-
But none of these uses are under identical terms. Carlos has already mentioned CreativeML v. BigScience; the BigCode license will also be different. Since there is no "one" RAIL-M license, this fails the "identifiable and stable text" plank of the definitive factors. I am talking with Carlos and Danish about maturing the licenses so that this will be possible in the future, but it's not there yet.
thanks @tieguy - so, do I take it we should close this and it can re-opened or a new issue opened when the text is stable?
I'm inclined to close this one for now.
Can someone enumerate what needs to be done?