Fix: map license URLs to SPDX IDs for machine readable format
This PR fixes an issue in Syft where Java project licenses with URLs were not properly mapped to SPDX license IDs. Currently, multiple or even single license URLs were being reported as LicenseRef-http---... instead of their proper SPDX identifiers, making the output machine-unreadable.
With this change:
License URLs such as http://www.eclipse.org/legal/epl-v10.html are now correctly mapped to EPL-1.0.
Deprecated or older license URLs like http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html are mapped to LGPL-2.1-only.
This ensures the licenseDeclared and licenseConcluded fields in SPDX and CycloneDX outputs are properly machine-readable.
This addresses the issues reported when analyzing Java dependencies in projects such as spring-petclinic.
Fixes #4233
Type of change
Bug fix (non-breaking change which fixes an issue)
Checklist:
-
[x] I have added unit tests for LicenseByURL covering the new URL mappings
-
[x] I have tested the changes in common scenarios (Java Maven projects with single/multiple license URLs)
Thanks for the PR @Avadhut03! I think we need this to be in a separate area since internal/spdxlicense/license_list.go is a generated file.
I think I'm open to having two maps here. One generated from the official SPDX source and the other contributed by users who see areas where we can map the URL and get better license answers.
cc @wagoodman for when he get's back to get a +1 on adding a maintainer map that we merge with the generated SPDX map on compile for one single lookup
Thanks for the feedback @spiffcs. That makes sense. I can update the PR to add a separate map for maintainer/user-contributed URLs and merge it with the generated SPDX map during compile time. Will wait for @wagoodman’s thoughts as well before making the changes.