cargo-dist
cargo-dist copied to clipboard
bundle dependency license info
https://github.com/sstadick/cargo-bundle-licenses
tl;dr generate a file that lists all the licenses of deps.
there are so many third party tools that do this that ideally we can leverage the best in class of them to get full coverage for all project types, but it's possible it's also "easy enough" to reimplement given that we are already excellent at finding and reading project manifest files and that using that graph as input to third party tools might be harder than just reimplementing.
wrt doing this package-manager-independently, in an ideal world this can be broken down into several separable concerns:
- compute dependency graph (language specific, although we potentially stitch results together into agnostic structure)
- lookup license of a package in insert-packagemanager-here (necessarily language-specific, but produces agnostic output)
- merge per-dep information into whatever format is needed (this part can be totally language agnostic)
When digging into this I discovered that, on the Rust side of things:
- there's a Cargo RFC for an intermediate SBOM format (for cargo-auditable, cargo-cyclonedx, etc to hopefully build on): https://github.com/rust-lang/rfcs/pull/3553
- which may wind up including license information: https://github.com/rust-lang/rfcs/pull/3553
- but, either way,
cargo metadataseems to include this information
E.g., for dist itself, cargo metadata --format-version 1 | jq '.packages[] | {"name","version","id","license"}' returns something like this:
[
{
"name": "addr2line",
"version": "0.24.2",
"id": "registry+https://github.com/rust-lang/crates.io-index#[email protected]",
"license": "Apache-2.0 OR MIT"
},
{
"name": "adler2",
"version": "2.0.0",
"id": "registry+https://github.com/rust-lang/crates.io-index#[email protected]",
"license": "0BSD OR MIT OR Apache-2.0"
},
{
"name": "aes",
"version": "0.8.4",
"id": "registry+https://github.com/rust-lang/crates.io-index#[email protected]",
"license": "MIT OR Apache-2.0"
},
...
]
I guess an important question is: do we just want to know what licenses are used, or do we also want the license text?
In Rust, getting the license name for dependencies using a standard license (setting license= to a SPDX license expression in Cargo.toml) can be found from dist metadata. However, for nonstandard-licenses, it's specified via license-file=, and you'd need to access the source code and get that file.
In JavaScript, package.json has a "license" which can be an SPDX expression or an arbitrary string. (They say "a string value like" "SEE LICENSE IN <filename>" but it's unclear if the text is significant.)
In Python, pyproject.toml has project.license which can be {file = "LICENSE_FILE_PATH"} (referring to a file) or {text = "LICENSE NAME"} (referring by name) or (preferred for well-known licenses) project.classifiers can include any of the 89 License :: ... classifiers.
In some languages, like C or C++, I'm not aware of any proper standard, so there may need to be a fallback in dist's metadata.
I think to start on this feature simply collecting the SPDX identifier works and being able to call out simply the presence of a "custom" license works. Most people using a feature like this are trying to have data to see if the tool contains any "blocklisted/sus" licenses (usually GPL+other). What're your thoughts?