cdxgen icon indicating copy to clipboard operation
cdxgen copied to clipboard

Cdxgen reports incorrect package license

Open kacq opened this issue 1 year ago • 7 comments

I would like to use cdxgen to monitor the licenses we use in the project. I generated an SBOM file and uploaded it to dependency tracker. Analyzing the results showed us that some cases appeared for which packages had an incorrectly assigned license. For exaple cdxgen assigned LGPL license to scipy while its on BSD ( https://scipy.org/faq/ ). Same situation with numpy LGPL : BSD

{ "author": "", "publisher": "", "group": "", "name": "scipy", "version": "1.10.1", "description": "Fundamental algorithms for scientific computing in Python", "hashes": [ { "alg": "SHA-256", "content": "e7354fd7527a4b0377ce55f286805b34e8c54b91be865bac273f527e1b839019" } ], "licenses": [ { "license": { "id": "LGPL-2.1-only", "url": "https://opensource.org/licenses/LGPL-2.1-only" } } ], "purl": "pkg:pypi/[email protected]", "externalReferences": [ { "type": "website", "url": "https://scipy.org/" } ], "type": "library", "bom-ref": "pkg:pypi/[email protected]", "evidence": { "identity": { "field": "purl", "confidence": 1, "methods": [ { "technique": "manifest-analysis", "confidence": 1, "value": "/home/karol/PycharmProjects/[...]/poetry.lock" } ] } }, "properties": [ { "name": "cdx:pypi:latest_version", "value": "1.11.2" } ] }

What's the cause of that incorrect assignement?

kacq avatar Sep 27 '23 06:09 kacq

@kacq, could you investigate a bit further? The license is retrieved from pypi. Multiple licenses may be getting returned while cdxgen uses only one of them.

https://github.com/CycloneDX/cdxgen/blob/master/utils.js#L2292

prabhu avatar Sep 27 '23 07:09 prabhu

Thank You for quick response. As far i can see Youre mapping info > license, end of thtat description contains "http://www.gnu.org/philosophy/why-not-lgpl.html>." so probably thats why it returns LGPL.

Wouldn't be better to search for license in info > classifiers? { "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Operating System :: MacOS", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX", "Operating System :: POSIX :: Linux", "Operating System :: Unix", "Programming Language :: C", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.9", "Topic :: Scientific/Engineering", "Topic :: Software Development :: Libraries" ],

kacq avatar Sep 27 '23 08:09 kacq

@kacq, I believe the correct behaviour is to retain the entire license text instead of trying to detect any IDs. setup.py classifiers do not include the valid spdx identifier.

Also, looking at the Scipy license, it is clear that there are bundled binaries that have other license types, so I am not sure how the entire thing could be considered as "BSD-3-Clause."

prabhu avatar Sep 27 '23 08:09 prabhu

@kacq, could you send a pull request based on below information?

https://github.com/CycloneDX/cdxgen/blob/master/utils.js#L2292

Replace p.license with

if (body.info.license.includes(" ")) {
  p.licenses = [{license: {name: "CUSTOM", text: {content: body.info.license}}}];
} else {
  p.license = findLicenseId(body.info.license);
}

The idea is if the content includes space (or perhaps more than 3 words), we store the entire text content

prabhu avatar Sep 27 '23 18:09 prabhu

Hi @prabhu , just observed this for a few other cases also as follows:

Though I'm not well-versed with JavaScript, let me know if you need any help or a pull request to fix this once the required changes have been identified.

cryptator avatar Jan 08 '24 10:01 cryptator

@cryptator, if you could triage further and send a PR based on the instructions below, that would be awesome! For node.js, it appears like the license was changed, so only version 0.3.22 is CC0-1.0. Could you check your version against the data from the API?

https://registry.npmjs.org/language-subtag-registry

Below is the line that tries to look up the license for a specific version and fall back to the repo license. https://github.com/CycloneDX/cdxgen/blob/master/utils.js#L466

For Java with maven, adding support for multiple licenses is a feature.

In the below line, we can see that only one license is getting set.

https://github.com/CycloneDX/cdxgen/blob/master/utils.js#L2465

To support multiple licenses, this needs to be changed to an array containing license objects where each object has an ID.

Eg:

p.licenses = [{license: {id: "MIT"}, {license: {id: "Apache-2.0"}]

We might get some validation errors and test failures due to spec compatibility, which we can try to address together.

prabhu avatar Jan 08 '24 11:01 prabhu

Thanks @prabhu , I'll try to work on this sometime this week and update back here.

cryptator avatar Jan 08 '24 11:01 cryptator