LicenseFinder icon indicating copy to clipboard operation
LicenseFinder copied to clipboard

License text extraction not working for Maven

Open rhuitl opened this issue 3 years ago • 3 comments

I would like to extract the license texts for Maven projects. What works for NPM does not for Maven. The license text field remains empty.

Steps to reproduce:

  1. Create a basic Maven project
mvn archetype:generate -DgroupId=com.mycompany.app -DartifactId=my-app -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4 -DinteractiveMode=false
  1. Run LicenseFinder
docker run -v "$PWD/my-app:/scan" -t licensefinder/license_finder /bin/bash -l -c "cd /scan && license_finder report --quiet --format json --columns name version authors licenses license_links approved summary description homepage install_path package_manager texts notice"

The output is:

{
	"dependencies": [{
		"name": "hamcrest-core",
		"version": "1.3",
		"authors": "",
		"licenses": ["New BSD"],
		"license_links": "http://opensource.org/licenses/BSD-3-Clause",
		"approved": "Not approved",
		"summary": "",
		"description": "",
		"homepage": "",
		"install_path": null,
		"package_manager": "Maven",
		"texts": "",
		"notice": ""
	}, {
		"name": "junit",
		"version": "4.11",
		"authors": "",
		"licenses": ["Common Public License Version 1.0"],
		"license_links": "",
		"approved": "Not approved",
		"summary": "",
		"description": "",
		"homepage": "",
		"install_path": null,
		"package_manager": "Maven",
		"texts": "",
		"notice": ""
	}]
}
  1. Compare with downloaded licenses
❯ find my-app/target/generated-resources/licenses
my-app/target/generated-resources/licenses
my-app/target/generated-resources/licenses/common public license version 1.0 - cpl1.0.html
my-app/target/generated-resources/licenses/new bsd license - bsd-license.html

I was trying to understand how the license text extraction works for NPM, and it seems like it's based on the filename (like LICENSE or similar).

Then I interested how the Maven module for LicenseFinder works but I'm lacking context and Ruby knowledge to really understand how it's supposed to work - after all it is downloading the license files, so it seems feasible to read in the texts from there.

Happy for any pointers :)

rhuitl avatar Apr 12 '21 20:04 rhuitl

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

cf-gitbot avatar Apr 12 '21 20:04 cf-gitbot

Hey @rhuitl ! It looks like the texts field is only ever used for CocoaPods projects seen here. It looks like for maven, the code is only ever returning the license name. If we want to fill in the full text, we will need to add it to maven_package.rb and have code to parse it out.

If you look in maven_dependency_finder.rb we are only looking at the xml file and not the licenses folder. This means we would need to match the names in the xml file to the license files themselves in the licenses folder and then fill it in on the maven package object.

Basically when the package manager calls the finder it needs to return more info to be passed back to the manager so it can be written on the package itself. Hope this makes sense. Feel free to make a PR or let me know if you need clarification!

xtreme-shane-lattanzio avatar Apr 15 '21 14:04 xtreme-shane-lattanzio

Thanks @xtreme-shane-lattanzio, you could indeed help me a bit by explaining how this works for NPM! Because I know it works, but I can't see any code that would do it, unlike CocoaPods where it is obvious :thinking: maybe it's hidden in this line? https://github.com/pivotal/LicenseFinder/blob/953b89d0c0bd75f66418a0637f0c879053e7d287/lib/license_finder/package_managers/npm.rb#L42

rhuitl avatar Apr 19 '21 09:04 rhuitl