rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

Expose a `rules_license` `PackageInfo` from imported dependencies

Open shs96c opened this issue 1 year ago • 13 comments

pip_parse allows us to import third party dependencies, but the imports lack enough information for us to generate an SBOM. It would be useful if targets imported from third party python deps were annotated with a PackageInfo from rules_license (notably, the purl is incredibly useful for generating CycloneDX format SBOMs).

While it might be possible to add this information in a custom way, adopting rules_license allows SBOMs to be generated without adding special logic to each ruleset.

shs96c avatar Jul 10 '24 10:07 shs96c

I've been keeping an eye on this too. It will likely be easier for Python when Licensing is standardized in PEP 639. It will support SPDX expressions as part of Core Metadata 2.4

groodt avatar Jul 10 '24 10:07 groodt

+1 for this getting added to rules_python. If anyone wants to take a stab at it, I can answer questions about pip machinery and help in this way.

aignas avatar Jul 10 '24 10:07 aignas

Assuming we had the license information, does this boil down to adding load(<rules_license>, "licenses"); licenses(...) to the pip-generated BUILD file?

rickeylev avatar Jul 10 '24 16:07 rickeylev

And being able to determine which license is appropriate from package metadata -- I think so. Was thinking about whether this would be a useful addition a bit back and the only constraint I can think of is rules_license stability. We aren't using SBOMs today and I've got my own automation that looks at package metadata for say prohibiting GPL licenses but this feature feels worthwhile.

arrdem avatar Jul 10 '24 22:07 arrdem

Yeah, +1 to the overall feature. It should be easy to add the loads for rules_license.

From what groodt said, it sounds like hard part will be getting the license info from whatever artifact was downloaded from pypi?

rickeylev avatar Jul 11 '24 03:07 rickeylev

From what groodt said, it sounds like hard part will be getting the license info from whatever artifact was downloaded from pypi?

Yes. It's quite messy at the moment. You can grab some license info but it's messy and all over the place. From the PEP.

""" This has triggered a number of license-related discussions and issues, including on outdated and ambiguous PyPI classifiers, license interoperability with other ecosystems, too many confusing license metadata options, limited support for license files in the Wheel project, and the lack of precise license metadata.

As a result, on average, Python packages tend to have more ambiguous and missing license information than other common ecosystems. """

As is typical for Python, due to it's age, a lot of it is messier than some of the other language ecosystems. I think the PEP is going to be accepted though.

groodt avatar Jul 11 '24 04:07 groodt

This request is the for PackageInfo, which doesn't need the license at all, even though the provider comes from rules_license. Hopefully we can expose the PackageInfo before we need to expose the rest of the information?

shs96c avatar Jul 11 '24 13:07 shs96c

Ah, got it. Is there an example from other rules? My understanding of purl spec is that you can easily build it from the discrete components.

pkg:pypi/[email protected]

https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst

It's unclear to me what should happen if a mirror or local/fork of a package is used instead of a canonical package from pypi for example.

groodt avatar Jul 12 '24 12:07 groodt

The purl is easy to construct, and all the required information should be available to the repo rules. rules_rust add PackageInfo to targets it generates, but it's somewhat obfuscated: https://github.com/bazelbuild/rules_rust/blob/c177ccc1a75b11badd984c01e51e61c840c572d8/crate_universe/src/rendering.rs#L347

I plan on adding this to rules_jvm_external soon too.

shs96c avatar Jul 12 '24 13:07 shs96c

Is the prefix for python packages always pkg:pypi/?

It seems the only necessary metadata to be added here are name and version?

Is py_library the appropriate place for the metadata? Does it only apply to dependencies fetched from an index? What about vendored libraries?

groodt avatar Jul 12 '24 23:07 groodt

Is py_library the appropriate place for the metadata?

I don't think so. From what I understand, the way the license stuff works is you specify a package-level value, e.g. package(default_applicable_licenses=[":license"])[1], and the :license target has various license info. Targets in the package automatically inherit the settings.

[1] Though I swear I thought they changed this name to something like "default_metadata" or something

rickeylev avatar Jul 12 '24 23:07 rickeylev

When constructing an SBOM, having the PackageInfo be on the python_library would be incredibly helpful, as it would avoid the need to use an aspect to go and tie the PackageInfo to the library.

You can use a add additional information to a purl to specify things like the repository_url and checksum, both of which can be handy.

shs96c avatar Jul 13 '24 08:07 shs96c

note 1: We learned today that rules_license isn't actively maintained and the package_metadata project is supposed to replace it

note 2: we're fine with changing the pypi generation code to add a package() call that sets package_metadata stuff. That is pretty non-invasive, easy, and overall cheap.

rickeylev avatar Jun 12 '25 02:06 rickeylev

It seems the latest effort is captured at https://github.com/bazel-contrib/supply-chain

albertocavalcante avatar Jul 21 '25 02:07 albertocavalcante

It seems that PEP-770 has been accepted and the SBOMs are being standardized in the Python ecosystem. The following may be useful: https://packaging.python.org/en/latest/specifications/binary-distribution-format/#the-dist-info-sboms-directory

aignas avatar Aug 11 '25 14:08 aignas