licensee icon indicating copy to clipboard operation
licensee copied to clipboard

Dependencies with license name provided as `MIT` can't get identified

Open mervyn-mccreight opened this issue 2 years ago • 10 comments

Here is the output for using e.g. version 3.1.0, I assume it will be the same for 3.2.0 as well:

io.github.g0dkar:qrcode-kotlin-jvm:3.1.0
 - ERROR: Unknown license URL 'https://github.com/g0dkar/qrcode-kotlin/blob/main/LICENSE' is NOT allowed

According to https://mvnrepository.com/artifact/io.github.g0dkar/qrcode-kotlin-jvm/3.1.0 the license name is published as MIT, so I assume it should get identfied as MIT when I'm looking at the fallback logic in licenses.kt.

I used the version 1.6.0 of this plugin to get the described results.

mervyn-mccreight avatar Nov 14 '22 16:11 mervyn-mccreight

The same goes for https://mvnrepository.com/artifact/org.amshove.kluent/kluent/1.68

org.amshove.kluent:kluent-common:1.68
 - ERROR: Unknown license URL 'https://github.com/MarkusAmshove/Kluent/blob/master/LICENSE' is NOT allowed

mervyn-mccreight avatar Nov 14 '22 17:11 mervyn-mccreight

Name-based lookup is only applied if the URL is not present. In both of these cases the URL supersedes that and is (obviously) not a canonical URL for the MIT license.

When we built the library we decided to favor URL since the overwhelming majority linked to an external canonical license copy. I would attempt to reach out to those libraries about switching to the SPDX link or the opensource.org link for the license. In my experience library authors usually understand the argument for pointing at these rather than their own copy.

JakeWharton avatar Nov 14 '22 17:11 JakeWharton

Ah, I see.

IMO you should consider taking a look at the name even if it can't map the URL to a license, because referring to the maven documentation about the licenses part of the pom, nothing is really specified for the url-part, but for the name part it is at least recommended to use a SPDX-identifier.

What do you think?

mervyn-mccreight avatar Nov 14 '22 17:11 mervyn-mccreight

That recommendation is very new and when sampling a few thousand artifacts almost none matched. I suspect the fact that MIT matches is more luck than intent. The overwhelming majority were "Apache 2.0" instead of the intended "Apache-2.0", for example.

JakeWharton avatar Nov 14 '22 18:11 JakeWharton

I understand.

I did not want to criticize the way you came to the conclusion that looking for the license-url with priority makes sense. I don't mean to replace that - what I think of is more like:

  • look if the URL can be mapped to a license
  • if that does not work (either url is not present, or url is not mappable to a license on the spdx license list) try to map the name interpreted as a possible spdx-id
  • if that also does not work, identify the license as unknown

Its more like an additional fallback to try to increase the detection rate.

mervyn-mccreight avatar Nov 15 '22 09:11 mervyn-mccreight

I wouldn't say I'm 100% opposed to it, but I did give this a fair bit of thought when figuring out how the project was going to initially work.

My concern is always that the URL and the name may disagree on the license, and that the Maven POM specification offers no guidance on precedence. I wanted to ensure we never surprised someone so in most cases the code prefers to fail and have a human decide. This is why you can allow a specific URL if you trust it to be stable. Or you can allow a specific coordinate if it uses a custom URL that you do not trust to be stable.

Another thing that was considered early on is attempting to download unknown URLs and match their contents. It was ruled out because the number of custom URLs was very low and things like GitHub/GitLab links involve a ton of surrounding HTML.

JakeWharton avatar Nov 16 '22 02:11 JakeWharton

There is a problem with the url though: the official urls are often just a template. For a valid license, you need to adopt the owner and year. At least this was the feedback I got: https://github.com/pgjdbc/pgjdbc/pull/2514#issuecomment-1135073402

hfhbd avatar Nov 16 '22 18:11 hfhbd

There is a problem with the url though: the official urls are often just a template. For a valid license, you need to adopt the owner and year.

I would be interested in a version of this plugin that does match by identifier instead for this reason. If this requires more manual overrides, so be it. Without promising anything yet, would you accept a PR that makes the matching strategy configurable or do you consider this to be out-of-scope for this plugin, @JakeWharton?

TheMrMilchmann avatar Dec 15 '23 12:12 TheMrMilchmann

I think it still has a lot of behavioral questions to work out. It's not like we can just flip a conditional here. Do we fall back to the URL if the name does not match? What happens if the name and URL match different licenses? I simply have never considered matching on the name and the entire design of the existing logic is around the URL, so I'm not keen to rush into something here.

JakeWharton avatar Jan 30 '24 15:01 JakeWharton

First of all, sorry for going quiet for a while, but work and vacation kept me busy 😅. I fully agree that this is not something that should be rushed, but I did actually have an idea in mind already:

I propose to expose an API that the build author may use to tune the strategy. Specifically, we can mimic the API for configuring metadata sources for repositories. This could look as follows (exact names and syntax are tbd):

licensee {
    licenseSources {       // 1
        licenseName()      // 2
        licenseUrl()       // 3
    }
}

We introduce a LicenseSources interface that may be used to configure the sources from which to infer license information. In the DSL, this is exposed under the licensee extension (1). The interface defines several methods that may be used to enable certain sources of information. Initially, those would use be license name (2) and license URL (3), but additional sources could be added in the future without breaking configuration or introducing needless complexity.

licenseSources would be a regular function under the licensee extension and probably have the signature fun licenseSources(action: Action<LicenseSources>). If the function is not called, we resort to the current behavior as default.

As an added benefit, we could allow the build author to call licenseSources {} (without adding any sources) to indicate that all license mappings must be specified explicitly. However, I suggest deferring this decision and treating it as a configuration mistake for now.

TheMrMilchmann avatar Apr 05 '24 18:04 TheMrMilchmann