license-list-XML icon indicating copy to clipboard operation
license-list-XML copied to clipboard

Provide machine-readable info about to which licenses an exception applies

Open sschuberth opened this issue 1 year ago • 9 comments

It would be great if each exception would provide machine-readable metadata about which license it can be used with.

ORT currently maintains such an associated manually over here, but we'd like to be able to generate / get that information automatically.

A typical use-case for such information is the semantic validation of SPDX expressions, like GPL-2.0-only WITH Classpath-exception-2.0. While that would be a valid example, MIT WITH Classpath-exception-2.0 would not be valid, as the Classpath-exception-2.0 cannot be applied to the MIT license.

sschuberth avatar Jan 24 '24 16:01 sschuberth

@sschuberth - as to your use-case for this, I think this concept of "incorrect" SPDX expressions was discussed back when we first adopted the license expression syntax and modifiers. There are many examples of SPDX license expressions, not just using WITH that wouldn't make sense or one could say were wrong. e.g., MIT+

I think the general thinking was along the lines of that we cannot correct for every possible thing people might do that may be wrong, and some of it comes down to trust. That is, if you got something that looked super weird, how would that impact your impression and trust of the overall SPDX document?

As to coding this for exceptions in particular and having a quick look at the ORT list - how is that determined? Who decided that a given exception cannot be used with another license not listed there? While in some cases, the exception names the license or it's otherwise obvious as to what license it was intended to be used with - I don't think that necessarily means someone couldn't use it with a different license and thus, such an SPDX expression would wrong. There could be variations that aren't really practical or we may never see, but I'm not sure it's wise to definitively state "this is invalid" which implies a legal determination about the license and exception being able to be used together. It's not for SPDX to make such legal determinations (not sure it should be for ORT either, to be honest).

Anyway, that is my gut thoughts, but others should weigh in and we can discuss further if I'm missing something!

jlovejoy avatar Feb 02 '24 22:02 jlovejoy

Who decided that a given exception cannot be used with another license not listed there?

Besides the obvious cases that you mention when the belonging license is part of the exception name, we've simply looked at the detailed description of an exception. E.g. Autoconf-exception-2.0 says "Typically used with GPL-2.0-only or GPL-2.0-and-later". "Typical" is enough here for us to limit the exception's use.

There could be variations that aren't really practical or we may never see

It's a good idea to not list the "valid" combinations, but maybe only the "invalid" ones, where "invalid" should mean something like "well, you can write that, but it does not make sense from a semantic point of view".

It's not for SPDX to make such legal determinations (not sure it should be for ORT either, to be honest).

We're not after making legal determinations at all... it's more about syntactic correctness of an SPDX expression, and maybe sometimes going a bit further into semantic correctness, but not making any legal determinations (where possible to separate this from semantic correctness).

sschuberth avatar Feb 06 '24 15:02 sschuberth

Hi @sschuberth, I agree with @jlovejoy that this is not something SPDX should host.

I have no issues with downstream libraries or tools such as ORT maintaining their own interpretations of what is a "valid" license combination, or a "valid" license-plus-exception combination.

But at the SPDX project level, @jlovejoy is right that the most we've gotten into handling has been to define what is a "syntactically" correct expression according to the SPDX License Expression syntax.

People are free to create syntactically-valid expressions like MIT+, or MIT AND MIT AND (MIT OR MIT), or MIT WITH Classpath-exception-2.0, or GPL-2.0-only AND GPL-3.0-or-later. That list in the previous sentence has a mix of expressions that are nonsensical, or that are unnecessarily complicated, or that seem contradictory, or that (in the last case) someone might say can't apply in practice without violating the license. At the SPDX level, I don't think we care. Trying to restrict the license expression syntax and grammar to only "correct" licenses is not beneficial as the upstream that SPDX tries to be.

Downstream tools can absolutely implement their own interpretations and warnings as they see fit, but I'm not in favor of trying to maintain anything like that in the License List itself, beyond the brief commentary in some of the notes indicating which licenses we may have anecdotally seen certain exceptions used with.

swinslow avatar Feb 07 '24 19:02 swinslow

While I accept what both of you are saying, I have to admit that esp. @swinslow's answer is quite disappointing. In real-world scenarios, beyond the theoretical legal world, people struggle a lot with adopting SPDX properly because of semantic ambiguities like the ones mentioned. All of @willebra, @pombredanne, @tsteenbe, @maxhbr and others can sing a song about how hard it is for users to find out whether syntactically different SPDX documents semantically express the same thing and / or what the differences are (e.g. when comparing SPDX documents created for the same package by different tools).

Pushing the responsibility for semantic canonization downstream to third-party tools also is not a good idea unless there was a clear specification what the preferred canonical form of each entity is (also see the discussion about lowercase vs uppercase license IDs and expression operators).

So at a minimum, SPDX should spec out those canonizations and also provide the necessary tooling for each major programming language. However, given that we do not even have mature libraries for all major languages for SPDX expression parsing or document creation, I have doubts that we would see such tooling any time soon.

sschuberth avatar Feb 08 '24 10:02 sschuberth

However, the discussion now digressed quite a bit from the original license-exception-association topic, towards semantic canonization of SPDX expression in general, so feel free to close this.

sschuberth avatar Feb 08 '24 10:02 sschuberth

@sschuberth - I think this is a topic that is probably better suited for the mailing list and even a discussion on an upcoming SPDX-legal call. We can then explain in more detail why the idea of making determinations as to what is a valid license expression using WITH or not is not really within the scope of the SPDX project and is making a legal determination (whoever makes this call)

I would also like to better understand what you mean by "people struggle a lot with adopting SPDX properly because of semantic ambiguities like the ones mentioned" - maybe some examples you've seen, etc. Again, this may be better via a discussion.

Also, the "legal world" is not theoretical ;)

The next SPDX-legal call is Feb 22nd - if that works for you to attend, perhaps you can send an email to the SPDX-legal mailing list with some thoughts on what the problem is or the challenges. Rather than suggest a potential solution first, an understanding of the problem to be solve can help elicit ideas on solutions from the broader community.

jlovejoy avatar Feb 08 '24 17:02 jlovejoy

As part of automatic analyzing of FOSS related obligations, users need to form a view on e.g. licenses and the obligations in them. E.g. OSS Review Toolkit ORT has a configuration file, where users can define their understanding of licenses. Licenses are typically keyed primarily by SPDX names. Some vendors and/or projects have published some license classifications, e.g. https://github.com/doubleopen-project/policy-configuration/blob/main/license-classifications.yml by Double Open, but in the end it is the users that decide what is their particular configuration.

The question on semantic validation seems to me similar to classifying licenses. License classifications are based on the content of license texts. Semantic meaning of license combinations or licenses with exceptions depend also on their content. There could be - and considering the variety of exceptions, it's likely - differences in how people understand exceptions. Some differences could also stem from varying appetite for risk. However, this is just initial thinking, and once someone has done the work to semantically understand certain combinations, it may be easier to actually judge this matter, whether there are areas that are very clear cuts for everyone, and what areas are more subject to interpretation. The challenge is that this builds on understanding the licenses one-by-one, so if there is no common ground on what licenses or exceptions exactly mean, it may be difficult to build the semantic understanding of combinations. Which again would put this into the hands of the users (or their suppliers).

I haven't been actively involved in the SPDX project for quite some time, but my understanding of the principles behind SPDX is similar to @jlovejoy @swinslow.

On the other hand, the practical solving of these questions is an on-going question, and likely for many. The more we work with automation, the more prevalent this question becomes. Not sure what is the correct place to address this issue, but at least the user should be able to make a choice. That could mean a configuration file in e.g. ORT, or perhaps on scanner level. I could imagine seeing a way to configure "invalid license combinations" in tools, perhpaps I don't care so much at which level, but as long as I get triggered "Hey, look at this.", then I could develop the configuration further.

Regarding SPDX, not sure what the functionality above would require, but there could be areas where SPDX support of some aspect of this would be beneficial. But perhaps this is just a question of post-processing of SPDX license expressions.

willebra avatar Feb 09 '24 11:02 willebra

I think this is a topic that is probably better suited for the mailing list and even a discussion on an upcoming SPDX-legal call.

I strongly prefer publicly searchable and asynchronous discussions in written form about topics like this for transparency. So I'm afraid that I'll probably not join SPDX-legal calls.

I would also like to better understand what you mean by "people struggle a lot with adopting SPDX properly because of semantic ambiguities like the ones mentioned" - maybe some examples you've seen, etc.

It's probably indeed a good idea to collect such problems with ambiguities / unclear usage collaboratively in a common public place.

Just to mention one example that comes to my mind: When expressing relations between packages, you can either say A DEPENDS_ON B or B DEPENDENCY_OF A. There are several more of such examples that only seem to complicate things without a clear benefit.

sschuberth avatar Feb 12 '24 10:02 sschuberth

Just to mention one example that comes to my mind: When expressing relations between packages, you can either say A DEPENDS_ON B or B DEPENDENCY_OF A. There are several more of such examples that only seem to complicate things without a clear benefit.

@sschuberth - this looks like an independent issue from the machine readable license exceptions.

I completely agree we should work to clear up any ambiguities.

I believe we fixed your example above in 3.0 by removing the DEPENDENCY_OF relationship. We made a number of other changes to relationships which we believe removes some of these ambiguities.

If you run into any other similar issues, I would suggest opening a separate issue for discussion on the spdx-spec repo.

goneall avatar Feb 12 '24 19:02 goneall