cdxgen icon indicating copy to clipboard operation
cdxgen copied to clipboard

[Python] License scanning for Python projects changes components in specific circumstances

Open johennin opened this issue 11 months ago • 1 comments

During some internal testing, it was discovered that running cdxgen with license scanning such as:

FETCH_LICENCE=true cdxgen --type python --output sbom.json /path/to/project

behaves differently when naming a project with the same name as an existing Python package and when not including the license scan.

For example, naming a local Python project after an existing Python package and running cdxgen without license scanning with the following pyproject.toml file:

[project]
name = "typing-extensions"
...

will give us the following component without license scanning:

...
"components": [
        {
            "group": "",
            "name": "typing-extensions",
            "version": "latest",
            "purl": "pkg:pypi/typing-extensions@latest",
            "type": "library",
            "bom-ref": "pkg:pypi/typing-extensions@latest",
            "evidence": {
                "identity": {
                    "field": "purl",
                    "confidence": 1,
                    "methods": [
                        {
                            "technique": "instrumentation",
                            "confidence": 1,
                            "value": "/tmp/cdxgen-venv-2GQ1QR"
                        }
                    ]
                }
            }
        },
...

and a different component with license scanning activated:

...
"components": [
        {
            "author": "\"Guido van Rossum, Jukka Lehtosalo, Łukasz Langa, Michael Lee\" <[email protected]>",
            "group": "",
            "name": "typing-extensions",
            "version": "latest",
            "description": "Backported and Experimental Type Hints for Python 3.8+",
            "licenses": [
                {
                    "license": {
                        "id": "PSF-2.0",
                        "url": "https://opensource.org/licenses/PSF-2.0"
                    }
                }
            ],
            "purl": "pkg:pypi/typing-extensions@latest",
            "type": "library",
            "bom-ref": "pkg:pypi/typing-extensions@latest",
            "evidence": {
                "identity": {
                    "field": "purl",
                    "confidence": 1,
                    "methods": [
                        {
                            "technique": "instrumentation",
                            "confidence": 1,
                            "value": "/tmp/cdxgen-venv-FnAVau"
                        }
                    ]
                }
            },
...

I would say that it is problematic because it changes the component that the SBOM describes.

Then again, it is solved by not using a name already in use by a Python package BUT it can be abused if, for example, an attacker knows the name of local projects which companies produce SBOMs for and can manipulate the SBOMs component with false information by creating and publishing a Python package with the same name as that project.

Thank you in advance!

johennin avatar Mar 21 '24 16:03 johennin

@johennin, I will keep this issue open. While I do not agree that cdxgen must deal with dependency confusion attacks it could at least add more properties to describe the source file it started the analysis from.

prabhu avatar Mar 21 '24 17:03 prabhu