cdxgen
cdxgen copied to clipboard
Cdxgen reports incorrect package license
I would like to use cdxgen to monitor the licenses we use in the project. I generated an SBOM file and uploaded it to dependency tracker. Analyzing the results showed us that some cases appeared for which packages had an incorrectly assigned license. For exaple cdxgen assigned LGPL license to scipy while its on BSD ( https://scipy.org/faq/ ). Same situation with numpy LGPL : BSD
{ "author": "", "publisher": "", "group": "", "name": "scipy", "version": "1.10.1", "description": "Fundamental algorithms for scientific computing in Python", "hashes": [ { "alg": "SHA-256", "content": "e7354fd7527a4b0377ce55f286805b34e8c54b91be865bac273f527e1b839019" } ], "licenses": [ { "license": { "id": "LGPL-2.1-only", "url": "https://opensource.org/licenses/LGPL-2.1-only" } } ], "purl": "pkg:pypi/[email protected]", "externalReferences": [ { "type": "website", "url": "https://scipy.org/" } ], "type": "library", "bom-ref": "pkg:pypi/[email protected]", "evidence": { "identity": { "field": "purl", "confidence": 1, "methods": [ { "technique": "manifest-analysis", "confidence": 1, "value": "/home/karol/PycharmProjects/[...]/poetry.lock" } ] } }, "properties": [ { "name": "cdx:pypi:latest_version", "value": "1.11.2" } ] }
What's the cause of that incorrect assignement?
@kacq, could you investigate a bit further? The license is retrieved from pypi. Multiple licenses may be getting returned while cdxgen uses only one of them.
https://github.com/CycloneDX/cdxgen/blob/master/utils.js#L2292
Thank You for quick response. As far i can see Youre mapping info > license, end of thtat description contains "http://www.gnu.org/philosophy/why-not-lgpl.html>." so probably thats why it returns LGPL.
Wouldn't be better to search for license in info > classifiers?
{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Operating System :: MacOS", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX", "Operating System :: POSIX :: Linux", "Operating System :: Unix", "Programming Language :: C", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.9", "Topic :: Scientific/Engineering", "Topic :: Software Development :: Libraries" ],
@kacq, I believe the correct behaviour is to retain the entire license text instead of trying to detect any IDs. setup.py classifiers do not include the valid spdx identifier.
Also, looking at the Scipy license, it is clear that there are bundled binaries that have other license types, so I am not sure how the entire thing could be considered as "BSD-3-Clause."
@kacq, could you send a pull request based on below information?
https://github.com/CycloneDX/cdxgen/blob/master/utils.js#L2292
Replace p.license with
if (body.info.license.includes(" ")) {
p.licenses = [{license: {name: "CUSTOM", text: {content: body.info.license}}}];
} else {
p.license = findLicenseId(body.info.license);
}
The idea is if the content includes space (or perhaps more than 3 words), we store the entire text content
Hi @prabhu , just observed this for a few other cases also as follows:
- language-subtag-registry shows up with license ODC-By-1.0 whereas it has license CC0-1.0
- javassist (https://github.com/jboss-javassist/javassist) shows up with license MPL-1.1 only, whereas it has multiple licenses.
Though I'm not well-versed with JavaScript, let me know if you need any help or a pull request to fix this once the required changes have been identified.
@cryptator, if you could triage further and send a PR based on the instructions below, that would be awesome! For node.js, it appears like the license was changed, so only version 0.3.22 is CC0-1.0. Could you check your version against the data from the API?
https://registry.npmjs.org/language-subtag-registry
Below is the line that tries to look up the license for a specific version and fall back to the repo license. https://github.com/CycloneDX/cdxgen/blob/master/utils.js#L466
For Java with maven, adding support for multiple licenses is a feature.
In the below line, we can see that only one license is getting set.
https://github.com/CycloneDX/cdxgen/blob/master/utils.js#L2465
To support multiple licenses, this needs to be changed to an array containing license objects where each object has an ID.
Eg:
p.licenses = [{license: {id: "MIT"}, {license: {id: "Apache-2.0"}]
We might get some validation errors and test failures due to spec compatibility, which we can try to address together.
Thanks @prabhu , I'll try to work on this sometime this week and update back here.