LicenseFinder
LicenseFinder copied to clipboard
LicenseFinder cannot determine the license of some npm packages
I tried running LicenseFinder on a large npm project and noticed that it could not determine the license of some npm packages (21 of 384). This surprised me because all 21 packages name a license (like MIT or Apache-2.0 for example) in their package.json file.
I tried reading through the source code to better understand how LicenseFinder determines the license of an npm package. Is it correct that LicenseFinder looks for a LICENSE
file inside the package? I checked the 21 packages an none included a LICENSE
file (although some included a LICENSE.md
file). One of these packages is vue-template-compiler for example.
If that is the case, I would suggest falling back to the license field inside the package.json if no LICENSE
file could be found.
We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.
The labels on this github issue will be updated when the story is started.
I guess I was wrong, hls.js contains both a LICENSE
file and a license field in its package.json, but LicenseFinder cannot determine its license (should be Apache 2.0).
Well, this is strange: I created a new npm project and installed all dependencies for which LicenseFinder could not determine the license in the previous project. In this new project LicenseFinder could determine all those licenses.
I debugged a bit into it: Turns out
npm list --json --long
does not reliably include license fields in its output. In my bigger project it included the license field only for the root package but not for any of the dependencies. This then caused spec_licenses
to be empty.
https://github.com/pivotal/LicenseFinder/blob/ced7de9f22a627cb7bd2f11f18e41ffb914ec0b0/lib/license_finder/packages/npm_package.rb#L67-L78
I think a more reliable approach would be to take the license information from the package.json
file of the actual package. Here is something I quickly hacked together (first time writing ruby code):
def initialize(npm_json)
install_path = npm_json['path']
p install_path
package_path = install_path.nil? ? nil : File.join(install_path, "package.json")
package_json = !package_path.nil? && File.file?(package_path) ? JSON.parse(File.read(package_path), max_nesting: false) : npm_json
spec_licenses = Package.license_names_from_standard_spec(package_json)
p spec_licenses
@json = npm_json
@identifier = Identifier.from_hash(npm_json)
@dependencies = deps_from_json
super(@identifier.name,
@identifier.version,
description: npm_json['description'],
homepage: npm_json['homepage'],
spec_licenses: spec_licenses,
install_path: install_path,
children: @dependencies.map(&:name))
end
Hi @WIStudent - FWIW, license_finder does check for licenses in the package.json (via npm list) before checking for license files.
In the first instance, were the packages reported, but licenses not found? I noticed NPM v7 doesn't list packages beyond immediate dependencies by default. I opened an issue about it a while back. https://github.com/pivotal/LicenseFinder/issues/834
Is it possible this change in behaviour in NPM explains what you're seeing?
@timani I switched a lot between npm versions, so I am not sure anymore which one I used in the tests above. But I just noticed that the output of npm list --json --long
depends on the current npm version, the npm version that was used to install the project, and whether the -a
option was included.
Install with v6.14.13 / run with v6.14.13 / no -a option
- Transitive dependencies are included, packages inside the json file have the "license" field
- license_finder reports 25 of 2264 with unknown licenses
Install with v6.14.13 / run with v7.18.1 / no -a option
- No transitive dependencies are included, "license" field only exist for root package
- license_finder reports 1 of 132 with unknown licenses
Install with v6.14.13 / run with v7.18.1 / with -a option
- Transitive dependencies are included, "license" field only exist for root package
- license_finder reports 11 of 2459 with unknown licenses
Install with v7.18.1 / run with v6.14.13 / no -a option
- Transitive dependencies are included, packages inside the json file have the "license" field
- license_finder reports 16 of 740 with unknown licenses
Install with v7.18.1 / run with v7.18.1 / no -a option
- No transitive dependencies are included, "license" field only exist for root package
- license_finder reports 11 of 132 with unknown licenses
Install with v7.18.1 / run with v7.18.1 / with -a option
- Transitive dependencies are included, "license" field only exist for root package
- license_finder reports 146 of 2236 with unknown licenses
Sadly I don't know how I got my original 21 of 384 unknown licenses.
By the way, you can pass the -a
flag to npm list
using license_finders --npm-options
flag
license_finder report --format=html --save=license-report.html --npm-options="\-a"
Are there actually over 2000 unique packages? I'd be interested in getting a (sanitized) copy of your package.json to have a play too.
@timhaines I cannot share the package.json unfortunately, it's a work related project that's not open source. Basicly it's an android/iOS app using capacitor + vue and aws-amplify for backend communication. The dev dependencies mostly consist of testing (jest and cypress), linting (eslint), packaging (webpack, babel, typescript) and some dependencies for own build/deploy scripts.
I checked the package-lock.json. According to the docs, when using npm7 the packages
field should contain every unique package. It contains 1377 prod and 2788 dev packages. The package.json contains 54 dependencies and 77 devDependencies (although I just noticed that 3 dependencies should actually be devDependencies).
A while ago I created a simple vue3 project using the vue-cli to checkout some new vue3 features. Although this project only has a total of 15 dependencies in its package.json, according to Github's dependency graph it's dependency tree contains 1018 dependencies.
Because everything gets bundled by webpack, almost nothing of the dependency tree lands in the final output. This just gave me the idea to further research if there are any webpack plugins that can list the licenses of any package that gets bundled into the final output. I stumbled across LicenseFinder because we are using Gitlab at work and Gitlab seems to use it as a base for their own integrated license scanning.
I don't see where the code is looking into a LICENSE
file. As @timhaines pointed out, it is getting it from the package json itself. @WIStudent did your hack solution actually get you better output? If it did we can look at getting a PR through but I think there may be NPM version issues that could be affecting this
@xtreme-shane-lattanzio I think there are multiple issues with npm7:
- As already mentioned in https://github.com/pivotal/LicenseFinder/issues/834,
npm list --json --long
no longer returns transitive dependencies - At least in my case the output of
npm list --json --long
also did not include thelicense
field in any of the dependencies.
The second point is the reason why I initially thought LicenseFinder would only search for LICENSE
files. Because the output of npm list
does not include any licenses, the spec_licenses
passed to the constructor of the superclass is always an empty array. And because install_path
is set to the directory of the package, LicenseFinder will then search this directory for licenses. At least that's the behavior that is documented in the superclass.
https://github.com/pivotal/LicenseFinder/blob/8d221073b508ecbbb727648b0c682a29b60c37b3/lib/license_finder/package.rb#L8-L20
My hacky sollution fixed my issue that the license was undefined for some of the found dependencies (but not the issue, that not all dependencies were found). It did by taking the install_path
directory that was taken from the npm list
output, looking for the package.json file inside that directory and reading the license information from that package.json file.
Npm itself admits that the output of npm list
is not the best and warns that there will probably be significant changes with npm8.
Starting with npm7, the package-lock.json contains every package inside the dependency tree in a flat array. The usage of npm7 can be detected by "lockfileVersion": 2
. I think instead of relying on npm list
it would be much simpler to take the paths to every installed package from the package-lock.json, follow these paths to each package.json, read the license information from the package.json file and additionally pass the path to install_path
, so that LicenseFinder can also search for license files (or whatever it does when both spec_licenses
and install_path
are present).
I think you have a good handle on this and I am not sure when we can prioritize this so please feel free to make a PR if you want to get this in!
Hi, any updates regarding this? There is already npm v9 and now we get empty licences list.
@Mistic92 I stopped using LicenseFinder and am now using webpack/rollup plugins instead to determine the packages that get bundled into the final output and their licenses.
@WIStudent but webpack/rollup is only for single language and if we don't use this packers it won't work. Looks like Pivotal is not investing a lot of time on this tool anymore.
@Mistic92 There is another issue https://github.com/pivotal/LicenseFinder/issues/916 that asks for support for npm 7 and newer, but there doen't seem to be any progress either. I guess most people that need license detection in npm dependencies moved to other solutions like I did with webpack/rollup plugins for example.