License Scanner fail on Python packages without the License metadata key.
I run osv-scanner scan --licenses=MIT . on a simple project to test it out.
As you can see in the report below well known libraries are reported as "UNKNOWN"
╭──────────────────────────────────────────────────────────────┬───────────┬───────────────────────────┬─────────────────┬─────────╮
│ LICENSE VIOLATION │ ECOSYSTEM │ PACKAGE │ VERSION │ SOURCE │
├──────────────────────────────────────────────────────────────┼───────────┼───────────────────────────┼─────────────────┼─────────┤
│ UNKNOWN │ PyPI │ attrs │ 25.3.0 │ uv.lock │
│ non-standard │ PyPI │ binaryornot │ 0.4.4 │ uv.lock │
│ BSD-2-Clause │ PyPI │ boolean-py │ 4.0 │ uv.lock │
│ non-standard │ PyPI │ chardet │ 5.2.0 │ uv.lock │
│ UNKNOWN │ PyPI │ click │ 8.1.8 │ uv.lock │
│ UNKNOWN │ PyPI │ colorama │ 0.4.6 │ uv.lock │
│ Apache-2.0 │ PyPI │ coverage │ 7.7.0 │ uv.lock │
│ UNKNOWN │ PyPI │ foss-flame │ 0.21.1 │ uv.lock │
│ UNKNOWN │ PyPI │ iniconfig │ 2.1.0 │ uv.lock │
│ UNKNOWN │ PyPI │ jinja2 │ 3.1.6 │ uv.lock │
│ UNKNOWN │ PyPI │ jsonschema-specifications │ 2024.10.1 │ uv.lock │
│ Apache-2.0 │ PyPI │ license-expression │ 30.4.1 │ uv.lock │
│ non-standard │ PyPI │ markupsafe │ 3.0.2 │ uv.lock │
│ UNKNOWN │ PyPI │ osadl-matrix │ 2024.5.22.10535 │ uv.lock │
│ UNKNOWN │ PyPI │ packaging │ 24.2 │ uv.lock │
│ BSD-3-Clause │ PyPI │ psutil │ 7.0.0 │ uv.lock │
│ GPL-2.0-or-later │ PyPI │ python-debian │ 1.0.1 │ uv.lock │
│ UNKNOWN │ PyPI │ referencing │ 0.36.2 │ uv.lock │
│ Apache-2.0 AND CC-BY-SA-4.0 AND CC0-1.0 AND GPL-3.0-or-later │ PyPI │ reuse │ 5.0.2 │ uv.lock │
│ UNKNOWN │ PyPI │ tomli │ 2.2.1 │ uv.lock │
│ UNKNOWN │ PyPI │ typing-extensions │ 4.12.2 │ uv.lock │
│ UNKNOWN │ PyPI │ utools │ 0.1.0 │ uv.lock │
╰──────────────────────────────────────────────────────────────┴───────────┴───────────────────────────┴─────────────────┴─────────╯
I did a little digging on why this may be happening and i think it is related on how osv-scanner reads the licenses.
My understanding is that osv-scanner read the information of a package from PyPI, that explains why utools my package is reported as unknown, since it is not published.
Checking the output of https://pypi.org/pypi/attrs/json i found out attrs is not using the field info.license, but info.license_expression instead. According to https://packaging.python.org/en/latest/specifications/core-metadata/#license it should take priority when present.
The package click provides its license as a classifier, which is the oldest method. According (again to PyPA), when the license used on the project is already registered as a valid classifier that must be used and the field info.license should be used for variations when needed.
Hi @pcastellazzi we depend on deps.dev for license data and https://deps.dev/pypi/attrs indicates that attrs has a unknown license so that's why we display UNKNOWN here.
Can you open a bug with deps.dev for this? I also found this related bug https://github.com/google/deps.dev/issues/94
This issue has not had any activity for 60 days and will be automatically closed in two weeks
See https://github.com/google/osv-scanner/blob/main/CONTRIBUTING.md for how to contribute a PR if you're interested in helping out.
Automatically closing stale issue