bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Inject package_metadata into module repos

Open fmeum opened this issue 3 weeks ago • 18 comments

For supply chain security tooling to work well with Bazel, all third-party (Bazel and non-Bazel) deps must be annotated with package_metadata targets. While it's usually easy for a module extension wrapping a package manager to do this, Bazel modules so far had no way to automate this tagging.

With this PR, Bazel itself will create a package_metadata target for all registry modules and register it as the default via REPO.bazel if the module doesn't already contain this file. The following alternatives have been considered, but dismissed:

  1. Expecting each module to register module metadata by hand. This is infeasible as many modules don't have Bazel-aware upstream projects and not all contributors can be expected to be aware (and accepting) of the required boilerplate.
  2. Adding new functionality to the Publish to BCR workflow to automatically add the required targets and files. Many modules can't use the workflow. Furthermore, static metadata isn't necessarily appropriate: If a module is patched, it should not carry the same PURL as the unmodified upstream module.

RELNOTES: Bazel modules from a registry that don't include a REPO.bazel file now automatically have a package_metadata target with their PURL injected and registered as the default for all targets in the module repo. Any module that is patched locally via a single_version_override will receive a deterministic version modifier that is unique on best-effort basis.

fmeum avatar Dec 01 '25 14:12 fmeum

FYI @mzeren-vmw @Yannic

fmeum avatar Dec 02 '25 09:12 fmeum

Does the PURL fragment already provide license info?

FYI @kotlaja this might be useful for the GDC Bzlmod migration.

meteorcloudy avatar Dec 02 '25 12:12 meteorcloudy

Does the PURL fragment already provide license info?

It doesn't, but it uniquely identifies the source for that info. My personal opinion is that "Bazel module <-> Bazel PURL" is a trivial one-to-one mapping and "Bazel module -> license" is a function that can only be evaluated by a lawyer, so it makes sense to automate the "Bazel module -> PURL" part and let users figure out the "PURL -> license" part, perhaps even out of band.

fmeum avatar Dec 02 '25 12:12 fmeum

It doesn't, but it uniquely identifies the source for that info. My personal opinion is that "Bazel module <-> Bazel PURL" is a trivial one-to-one mapping

Well.. not really. Because with module overrides, you can get local patches which make my code labeled as purl=x different from your code labeled as purl=X. PURL's are like any other URL, they are sort of a hint more than anything precise. OTOH, we recognize this in the supply chain tools and will let you override licenses and other attestations by purl or target.

aiuto avatar Dec 02 '25 21:12 aiuto

@fweikert Please also take a look

meteorcloudy avatar Dec 03 '25 13:12 meteorcloudy

I can send a followup pr to address @aiuto's concern: In the case of a single_version_override, we can append a hash of all patches to the version number as a new segment. This ensures that patches are always visible on the SBOM level.

fmeum avatar Dec 03 '25 15:12 fmeum

@bazel-io fork 9.0.0

fmeum avatar Dec 03 '25 20:12 fmeum

@fmeum We have some test failures on RBE during import: https://buildkite.com/bazel/google-bazel-presubmit/builds/98041

Maybe try to rebase to HEAD and see if it reproduce on the PR?

meteorcloudy avatar Dec 04 '25 09:12 meteorcloudy

I will take a look and address @aiuto's comment along the way.

fmeum avatar Dec 04 '25 09:12 fmeum

Oh, postsubmit is red since https://buildkite.com/bazel/bazel-bazel/builds/34042

meteorcloudy avatar Dec 04 '25 09:12 meteorcloudy

postsubmit is due to some RBE infra change, we are fixing it.

meteorcloudy avatar Dec 04 '25 10:12 meteorcloudy

@aiuto Could you take another look? I hope I addressed your concern.

fmeum avatar Dec 04 '25 21:12 fmeum

Is there more background to this? A tracking issue or a doc explaining the motivation and use case? Or maybe just a more detailed PR description.

Wyverald avatar Dec 08 '25 20:12 Wyverald

Is there more background to this?

Does a video count? https://youtu.be/Q4p-I9TsUnA?si=GDFYEhqULCdmKb_m basically says "package_metadata all the things" so that the SCS tooling in development doesn't have any blind spots.

@mzeren-vmw @Yannic Do you have non-video references I could add to the PR description?

fmeum avatar Dec 08 '25 20:12 fmeum

Thanks for the link -- I missed this talk. Will watch it when I get a chance to breathe :P

I'm just being a bit cautious since this PR is adding a new built-in dependency (package_metadata) to MODULE.tools, a new undocumented purl_fragments attribute to http_archive and co., and even adds package_metadata as a new "well-known module" (which I was hoping to limit to just bazel_tools). That makes me just a tiny bit uncomfortable as I don't really understand what this is doing, and the information in the PR is a bit scarce.

For one example, what if the module has a REPO.bazel for completely unrelated reasons (such as ignore_directories or some default feature set)? Is it expected that the module would suddenly not get any PURL information?

Wyverald avatar Dec 09 '25 00:12 Wyverald

@Wyverald That kind of context I can provide, I amended the PR description.

For one example, what if the module has a REPO.bazel for completely unrelated reasons (such as ignore_directories or some default feature set)? Is it expected that the module would suddenly not get any PURL information?

We can't really do better than that without buildozer, which we likely don't want to add to the well-known modules (folks may legitimately want to m_v_o it). It's not too bad though since projects that maintain a REPO.bazel are 1) already aware of its existence and 2) already need to add less boilerplate to set up a default package_metadata.

My main goal is to get package_metadata wired up for, say, 95% of all module versions without increased effort for BCR contributors. Everything that remains will be much more amenable to case-by-case solutions.

fmeum avatar Dec 09 '25 14:12 fmeum

I‘m not sure whether this is the right step: We (supply-chain wg) are currently working on allowing users to inject package_metadata targets into other modules without requiring users to patch literally every module.

Further, just the package_metadata target isn‘t that useful. The interesting data is in attributes, not in the purl. And Bazel doesn‘t really have sufficient info to generate any attributes (e.g. license).

I think it would be useful for Bazel to provide data about (1) itself (think version), (2) which archives were downloaded by the module extension (and where they were extracted into), (3) information about which bzl files were involved in the evaluation of a module extension or rule/aspect/macro to get a paper trail for what‘s involved in creating an binary artifact and the transitive dependencies.

@fmeum if you could join the wg meeting to discuss what you have in mind here, that‘d be great!

Yannic avatar Dec 09 '25 19:12 Yannic

cc @aiuto @thegrizzlydev

Yannic avatar Dec 09 '25 19:12 Yannic