conda-forge.github.io
conda-forge.github.io copied to clipboard
Rust/Go packages license issues
A typical rust package use dozens of packages which have different licenses and requirements. A rust package and its dependencies are usually compiled into one library or executable. For eg: https://github.com/conda-forge/staged-recipes/pull/11315 has a rust package with 91 dependencies with various MIT/BSD-3-Clause/Apache-2.0 licenses and maybe others.
This implies that the licenses and copyrights of the dependencies need to be distributed with the package. There are some tools to help do this like https://github.com/maghoff/cargo-license-hound, https://github.com/onur/cargo-license.
I'm opening this issue so that @conda-forge/staged-recipes and @conda-forge/core know about this when reviewing Rust recipes.
cc @andfoy, @mingwandroid
What I'm doing in particular is using the JSON output information produced by cargo-license and then grab the repository urls across GitHub, BitBucket and GitLab to call their respective APIs to locate and download all the licenses. However, some libraries need a manual license download still.
Doesn't the same concern apply to go packages?
To not re-invent the wheel here, how are other packaging eco systems solving that e.g. linux distributions like debian or homebrew?
Yes, the same concern apply to Go packages. See also https://github.com/google/go-licenses
I've no idea how others fix this.
I am not sure how you want to address that but it does not seem straightforward. We could use a script that goes over all the dependencies, parse for the licenses, and list all the licenses per deps in the conda package?
Also at what level this script should be run? conda or conda-forge?
@hadim, what @andfoy did for rust was to use a script to download licenses and put them in the recipe (and manually add licenses for packages that the script failed). He also added a check in build.sh to check that each dependency had a license file in the recipe. Same can be done for Go.
It makes sense.
That being said I probably don't have the bandwidth at the moment to do that for https://github.com/conda-forge/staged-recipes/pull/11799
For go, it's simple. See https://github.com/google/go-licenses#complying-with-license-terms
Quick thought, this also applies to C++ packages when you link statically with your dependencies.
Should this be extended to header only dependencies as well? For example, if you use pybind11, boost, etc, do you need to package the license file used by them as well? Because that's as good as statically linking parts of them.
Perhaps there needs to be a licence_exports field in the conda build metadata.
Should this be extended to header only dependencies as well?
Depends on the license.
For example, if you use pybind11, boost, etc, do you need to package the license file used by them as well?
pybind11: yes. boost: no.
Thanks for the guidance here on this topic: texlab-feedstock is now using the same approach as pysyntect-feedstock, and "only" required manually hunting down 20 licenses (of 200+). Perhaps we should package cargo-license... seems to cost a couple minutes per build.
As this has come up again for @conda-forge/cryptography:
I wonder if we start curating a community package, e.g. conda-forge-rust-licenses and conda-forge-go-licenses (or just lump them together under conda-forge-license-library) which has some automation to at least allow centralizing the list of known/used <thing>/<version>/(UN)LICEN(S|CE(-.*)(.(txt|md))? (oh and don't forget COPYRIGHT.*). Then packages can demand said package during builds, copying the assets from a well-known location to wherever there license_file points... now that we can use folders, that's much easier. If a new crate/mod shows up, the build would fail, but might suggest...
Some wild crates and mods approach!
- <crate>@<version> <url>
- <mod>@<version> <url>
From inspection, I've found the below licenses. Please visit the upstream repos and verify, then
make a pull request to https://github.com/conda-forge/conda-forge-license-library adding the lines:
### recipe/licenses/cargo.txt
<repo>@<tag>/LICENSE-MIT
<repo>@<tag>/LICENSE-APACHE
### recipe/licenses/go-mod.txt
<repo>@<tag>/LICENSE-ZLIB-WITH-FREAKY-SPEC
this would in turn update the recipe (once) so we actually have the licenses sha256sums.
So would a conda-incubator/* be the right path? I'm imagining a small (potentially single file) python package with a simple in-build CLI like cargo-licenses | dmv -o $SRC_DIR/third-party-licenses. The JSON/CSV file with, at the very least, the couple hundred licenses URLs/SHAs, would then live in the feedstock... but could contain the actual licenses texts themselves.
Hello! I've been working on a tool to hopefully mitigate this issue / make it less painful to publish rust tools on conda-forge. It can be found here.
In short, it crawls the package dependencies and searches out the license files that correspond to what is in the Cargo.toml. If a license isn't found or looks suspicious it will write a warning message. It also provides a "check" flag that takes a previous version of a THRIDPARTY file and compares that against the new one, failing if they are different.
The idea is that the workflow would go as follows:
- Run
cargo bundle licensesonce, address all warnings by manually finding licenses where needed and copy-pasting them into the generated file. CHeck that file into version control andincludeit your manifest. - Include
cargo bundle licenses --output CI-THIRDPARTY --previous THIRDPARTY --check-previousin your CI. This will carry forward any manually changed entries for you, then do a whole file check for sameness, so if a version changed it would fail and force you back to step 1.
Currently this tool supports three formats: yaml, json, and toml. See the above repo for an example yaml THIRDPARTY file.
In the view of conda-forge maintainers, would this satisfy the requirement of licenses and copyrights of the dependencies need to be distributed with the package?
Looks good! Really anything that moves things forward sounds great to me... I'm wagering if:
- the proposed tool (and/or
cargo-licenses, if not superseded) is packaged (dogfooding itself) throughstaged-recipes- so that we can just add it to
requirements/build - and/or
test/requires, and call it, simply, inbuild.sh|bld.bat
- so that we can just add it to
- its use is demonstrated on a PR to a "tent pole" package like
ripgrep- so that we have something to point to on other
staged-recipePRs/a knowledge base text chunk
- so that we have something to point to on other
... I don't see what complaints there would/could be.
From a KISS perspective, and as I don't really want to hand edit this file, I'd see JSON being the preferable serialization format... to that end, now that SPDX 2.2.1 is ISO5962, I'd really hope we start seeing it adopted more broadly (and provided by upstream packagers) and can stop needing to re-implement clever stopgaps.
@bollwyvl, thanks for the feedback!
Here is a PR for adding cargo-bundle-licenses to staged-recipes. To be clear, this would supersede cargo-license. The soul purpose of this tool is to satisfy the requirements of conda-forge packaging and make it less onerous to publish rust packages here.
I have two PR's dogfooding it right now: https://github.com/conda-forge/staged-recipes/pull/16110 and https://github.com/conda-forge/staged-recipes/pull/16111, I'll update them to pull in cargo-bundle-licenses via build requirements once / if the cargo-bundle-licenses PR can be merged.
That's great progress! Good luck! Once again, I'd prioritize the initial staged-recipes PR for the tool itself, and then ensure it meets the needs of at least one known-important, but presently hand-curated, package, as they are the most likely to have been reviewed. Ensuing new packages will then be an easier pitch, as we'll be more confident.
By the by: I can't merge anything, don't really do rust (or go) dev, and am actually super constrained on community time right now anyway, so really I'm just selfishly looking forward to having some tools like this to ease my personal maintenance burden. God- (or -spirit-or-priniciple-or-animus-or-whatever-) speed!
@bollwyvl I appreciate the guidance on this!
Thanks @sstadick! I merged the tool recipe.
Both https://github.com/conda-forge/staged-recipes/pull/16110 and https://github.com/conda-forge/staged-recipes/pull/16111 are now using the conda-forge cargo-bundle-licenses package to check that all thirdparty licenses are present.
PR Adding cargo-bundle-licenses to ripgrep-feedstock https://github.com/conda-forge/ripgrep-feedstock/pull/17
Looks good to me. What happens when cargo-bundle-licenses can't find a license/copyright for a package?
If run without --check-previous it will just write a warning say it couldn't find the license, and then in the THIRDPARTY.yml file it will put NOT FOUND for the license text, the idea being that a user would then go find it and manually add it so that the next time you run it with --previous it will pull the manually found license forward for you if it still can't find it.
If running with --check-previous, as in the PR's above / in CI generally, if something is still NOT FOUND or different than the --previous license set the tool will exit 1 and fail to get someone's attention. Hopefully this means the THIRDPARTY file will actually stay up to date as deps change instead of making it once and forgetting it.
Perfect. Thanks for working on this
If this is good to go I'd love to get these two PR's merged: conda-forge/staged-recipes#16110 and conda-forge/staged-recipes#16111.
I'm sure there will be rough edges with cargo-bundle-licenses, I'm more than happy to resolve issues as they come up / help Rust packages get into conda.
@sstadick thanks for the great tool! I've also used it in https://github.com/conda-forge/staged-recipes/pull/16252 I can see it being useful in other projects too.
Thanks @sstadick! 😄
It would be great to integrate this strategy into grayskull, which we use to create/update recipes