meson-python icon indicating copy to clipboard operation
meson-python copied to clipboard

Support for PEP 770 (SBOMs)

Open rgommers opened this issue 8 months ago • 8 comments

PEP 770 is accepted, and specifies how wheels can start incorporating SBOMs as metadata under .dist-info/sboms/.

PEP 770 does not provide metadata in the [project] table for SBOMs, for reasons discussed in the PEP (static + dynamic metadata is expected to be common). An earlier version did use [project], and https://github.com/pypa/pyproject-metadata/pull/225 prototyped support for that in pyproject-metadata. That part is no longer needed, but support in a build backend will be since .dist-info is generated by a build backend.

Technically it's possible to do something hacky like this today in a meson.build file:

install_data(
    'sboms/an_sbom_filename.spdx.json',
    install_dir: py.get_install_dir() / meson.project_name() + '-' + meson.project_version() + '.dist-info' / 'sboms',
)

But obviously that isn't recommended.

What we need instead is this in pyproject.toml:

# static SBOM files that go into all wheels
[tool.meson-python.sboms]
sbom-files = [
    "sboms/component1.spdx.json",
    "sboms/component2.spdx.json",
]

That's the basic support. There are also cases possible where a vendored component only gets included in wheels for say one platform, or if a particular build option is given. That's a lot harder to deal with, and could be done either in [tool.meson-python] or through some mechanism with data files in meson.build files (e.g., install_data(..., install_tag: 'sbom')` - lots of options and more limited needs, so let's leave that for the future I'd say.

It'd be nice to align this with other backends, so the mechanism looks similar.

  • scikit-build-core: I don't see an issue yet, Cc @henryiii for thoughts
  • maturin: open feature request at https://github.com/PyO3/maturin/issues/2554

rgommers avatar Jun 10 '25 13:06 rgommers

Do we know of any project that is ready to add SBOMs? It would be nice to collect a bit more information on the use cases before designing this feature.

I don't know much about SBOMs, but I have the impression that SBOMs are expected to be dynamic and automatically generated, not something edited by hand (and the chosen formats for this files strengthen this impression: I don't think anyone wants to edit JSON by hand).

Therefore, maybe, we should plan from the start for the SBOMs to be automatically generated and have them defined in meson.build rather than in pyproject.toml. The install_data() hack can be made to work in a nicer way simply defining that everything added to a particular installation prefix is added as a SBOM to the wheel. A prefix like {datadir}/.dist-info/sboms could work.

dnicolodi avatar Jun 10 '25 14:06 dnicolodi

Thinking about it, I like the install_data() solution also for adding licenses to .dist-info/licenses, although that would not solve the issue of combining components licenses in the License-Expression metadata field... maybe it is not a very clever idea after all.

dnicolodi avatar Jun 10 '25 14:06 dnicolodi

If the package uses meson's license fields for project(), then you can setup meson with --licensedir and it will install both the license files and a meson-specific json manifest of the licenses/subprojects used.

The PEP doesn't describe what the contents of an SBOM are, only that it describes the package somehow in some manner that indicates how it was put together. The one meson produces isn't very elaborate, but it does encapsulate information about subproject versions etc.

eli-schwartz avatar Jun 10 '25 15:06 eli-schwartz

I don't know much about SBOMs, but I have the impression that SBOMs are expected to be dynamic and automatically generated, not something edited by hand

It's definitely both. The PEP talks about this. Things that come in through auditwheel et al are dynamic, but for those the plan is for auditwheel itself to gain the capability to amend .dist-info/sboms since that's the only way to really get it right.

There are also static SBOMs, for components that are vendored at the source level. E.g., pip vendors packaging, cpython vendors pip, etc.

Do we know of any project that is ready to add SBOMs?

I'm thinking about it for NumPy and SciPy. Scikit-learn has already spent a bit more time thinking about it I believe (Cc @ogrisel, xref https://github.com/scikit-learn/scikit-learn/issues/28151#issuecomment-1900095402).

A project like SciPy vendors O(10) other libraries, in whole or in part. Here is a partial list: https://scipy.github.io/devdocs/dev/core-dev/index.html#vendored-code. I'm think hand-writing SPDX SBOMs once is the way to go for most of those components, but I haven't actually tried that so I could be wrong.

@sethmlarson as the PEP author, you may have things to add here or may know of other projects in the process of starting adoption?

rgommers avatar Jun 10 '25 15:06 rgommers

As linked above, for the scikit-learn case, the plan is to automatically generate sboms at the time we bump up the version of a fully vendored Python-only dependency such as array-api-extra and array-api-compat. But there are also manually edited backward compat Python files that sometimes manually include code snippet from upstream. We only forked some C++ code that we somewhat maintain independently of the original C++ project. For those files, we do not have automated tooling to synchronize with upstream nor automatically generate sbom files. I did not plan to manually maintain SBOM files for those. I am not sure if they would really qualify as vendored files since they are often significantly edited when included in scikit-learn. This is a bit of a gray area.

That being said, +1 for introducing a directive to simplify the inclusion of static SBOM files into the right location under the .dist-info folder generated by the build backend.

We could have an explicit way to distinguish between files that should be present for all platforms from SBOM files that are expected to be sometimes missing. For instance, there could be SBOM files that generated by a CI specific preprocessor script just before calling python -m build to track build time information (e.g. the version of the compiler or the components of the github actions workflow used to generate the release).

[tool.meson-python.sboms]
# Expected to be present for all platforms. Should trigger a build error if missing.
sbom-files = [
    "sboms/component1.spdx.json",
    "sboms/component2.spdx.json",
]
# Can be present for some builds / platforms. Should not trigger a build error when missing.
optional-sbom-files = [
    "sboms/ci-build-spec.spdx.json",
]

We still want to allow end-users to build locally without errors if the CI generated SBOM files are missing.

ogrisel avatar Jun 13 '25 10:06 ogrisel

Linking to how CPython deals with SBOMs for vendored code:

  • Dev guide docs: https://devguide.python.org/developer-workflow/sbom/
  • Script to generate and validate SBOM entries: https://github.com/python/cpython/blob/main/Tools/build/generate_sbom.py
  • Static SBOM in the source tree: https://github.com/python/cpython/blob/main/Misc/sbom.spdx.json

For instance, there could be SBOM files that generated by a CI specific preprocessor script just before calling python -m build to track build time information

Thanks for your thoughts @ogrisel. I think we may want to deal with that through a dynamic mechanism in meson.build files rather than an optional-sbom-files in pyproject.toml. That will be more general, and avoids assumptions like the whole file having to be writting into the source tree before the build starts. Your "sboms/ci-build-spec.spdx.json" probably won't exist or will be incomplete before the wheel build is invoked, because compiler name and version can only reliably come from Meson itself.

rgommers avatar Jun 14 '25 09:06 rgommers

Having an explicit callback mechanism to generate dynamic sboms at the end of the meson build would indeed be ideal.

ogrisel avatar Jun 14 '25 09:06 ogrisel

Thanks @rgommers for opening this issue and for all of your interest adopting PEP 770!

There are also static SBOMs, for components that are vendored at the source level. E.g., pip vendors packaging, cpython vendors pip, etc.

Source-level non-Python SBOMs are definitely more likely to be closer to "hand-authored" until better tooling emerges for managing these types of SBOMs/dependency relationships. Like you said, these dependencies don't change often so I'm hopeful it's not too heavy a lift for most projects.

There's two approaches that can be taken: one would use the SBOM as a configuration file for what version of an upstream library is vendored into the source, the other would be to "generate" the SBOM using the script that does the vendoring (CPython takes the second approach).

I did not plan to manually maintain SBOM files for those. I am not sure if they would really qualify as vendored files since they are often significantly edited when included in scikit-learn. This is a bit of a gray area.

If the code is licensed differently than the project I would recommend including the project ID and project license in the SBOM. If the project would consider a vulnerability in the included code not to be a vulnerability in the project itself, but rather the vendored project, I would include the project ID in your SBOM. If this code snippet/module is one that has vulnerabilities often or you could imagine a user patching the code themselves then I would recommend including the project ID in the SBOM.

Hopefully these help guide deciding whether or not to include a project in an SBOM, happy to chat more about this topic (it's a tough one!)

sethmlarson avatar Jun 15 '25 16:06 sethmlarson