conda-forge.github.io icon indicating copy to clipboard operation
conda-forge.github.io copied to clipboard

Rust/Go packages license issues

Open isuruf opened this issue 5 years ago • 49 comments

A typical rust package use dozens of packages which have different licenses and requirements. A rust package and its dependencies are usually compiled into one library or executable. For eg: https://github.com/conda-forge/staged-recipes/pull/11315 has a rust package with 91 dependencies with various MIT/BSD-3-Clause/Apache-2.0 licenses and maybe others.

This implies that the licenses and copyrights of the dependencies need to be distributed with the package. There are some tools to help do this like https://github.com/maghoff/cargo-license-hound, https://github.com/onur/cargo-license.

I'm opening this issue so that @conda-forge/staged-recipes and @conda-forge/core know about this when reviewing Rust recipes.

cc @andfoy, @mingwandroid

isuruf avatar Apr 30 '20 01:04 isuruf

What I'm doing in particular is using the JSON output information produced by cargo-license and then grab the repository urls across GitHub, BitBucket and GitLab to call their respective APIs to locate and download all the licenses. However, some libraries need a manual license download still.

andfoy avatar Apr 30 '20 01:04 andfoy

Doesn't the same concern apply to go packages?

nehaljwani avatar May 02 '20 17:05 nehaljwani

To not re-invent the wheel here, how are other packaging eco systems solving that e.g. linux distributions like debian or homebrew?

dbast avatar May 09 '20 15:05 dbast

Yes, the same concern apply to Go packages. See also https://github.com/google/go-licenses

I've no idea how others fix this.

isuruf avatar May 15 '20 01:05 isuruf

I am not sure how you want to address that but it does not seem straightforward. We could use a script that goes over all the dependencies, parse for the licenses, and list all the licenses per deps in the conda package?

hadim avatar Jun 05 '20 15:06 hadim

Also at what level this script should be run? conda or conda-forge?

hadim avatar Jun 05 '20 15:06 hadim

@hadim, what @andfoy did for rust was to use a script to download licenses and put them in the recipe (and manually add licenses for packages that the script failed). He also added a check in build.sh to check that each dependency had a license file in the recipe. Same can be done for Go.

isuruf avatar Jun 05 '20 15:06 isuruf

It makes sense.

That being said I probably don't have the bandwidth at the moment to do that for https://github.com/conda-forge/staged-recipes/pull/11799

hadim avatar Jun 05 '20 15:06 hadim

For go, it's simple. See https://github.com/google/go-licenses#complying-with-license-terms

isuruf avatar Jun 05 '20 15:06 isuruf

Quick thought, this also applies to C++ packages when you link statically with your dependencies.

SylvainCorlay avatar Jun 05 '20 20:06 SylvainCorlay

Should this be extended to header only dependencies as well? For example, if you use pybind11, boost, etc, do you need to package the license file used by them as well? Because that's as good as statically linking parts of them.

nehaljwani avatar Jun 06 '20 11:06 nehaljwani

Perhaps there needs to be a licence_exports field in the conda build metadata.

chrisburr avatar Jun 06 '20 11:06 chrisburr

Should this be extended to header only dependencies as well?

Depends on the license.

For example, if you use pybind11, boost, etc, do you need to package the license file used by them as well?

pybind11: yes. boost: no.

isuruf avatar Jun 06 '20 15:06 isuruf

Thanks for the guidance here on this topic: texlab-feedstock is now using the same approach as pysyntect-feedstock, and "only" required manually hunting down 20 licenses (of 200+). Perhaps we should package cargo-license... seems to cost a couple minutes per build.

bollwyvl avatar Jun 30 '20 01:06 bollwyvl

As this has come up again for @conda-forge/cryptography:

I wonder if we start curating a community package, e.g. conda-forge-rust-licenses and conda-forge-go-licenses (or just lump them together under conda-forge-license-library) which has some automation to at least allow centralizing the list of known/used <thing>/<version>/(UN)LICEN(S|CE(-.*)(.(txt|md))? (oh and don't forget COPYRIGHT.*). Then packages can demand said package during builds, copying the assets from a well-known location to wherever there license_file points... now that we can use folders, that's much easier. If a new crate/mod shows up, the build would fail, but might suggest...

Some wild crates and mods approach!

- <crate>@<version> <url>
- <mod>@<version> <url>

From inspection, I've found the below licenses. Please visit the upstream repos and verify, then 
make a pull request to https://github.com/conda-forge/conda-forge-license-library adding the lines:

### recipe/licenses/cargo.txt

<repo>@<tag>/LICENSE-MIT
<repo>@<tag>/LICENSE-APACHE

### recipe/licenses/go-mod.txt

<repo>@<tag>/LICENSE-ZLIB-WITH-FREAKY-SPEC

this would in turn update the recipe (once) so we actually have the licenses sha256sums.

bollwyvl avatar Mar 01 '21 12:03 bollwyvl

So would a conda-incubator/* be the right path? I'm imagining a small (potentially single file) python package with a simple in-build CLI like cargo-licenses | dmv -o $SRC_DIR/third-party-licenses. The JSON/CSV file with, at the very least, the couple hundred licenses URLs/SHAs, would then live in the feedstock... but could contain the actual licenses texts themselves.

bollwyvl avatar Mar 02 '21 03:03 bollwyvl

Hello! I've been working on a tool to hopefully mitigate this issue / make it less painful to publish rust tools on conda-forge. It can be found here.

In short, it crawls the package dependencies and searches out the license files that correspond to what is in the Cargo.toml. If a license isn't found or looks suspicious it will write a warning message. It also provides a "check" flag that takes a previous version of a THRIDPARTY file and compares that against the new one, failing if they are different.

The idea is that the workflow would go as follows:

  1. Run cargo bundle licenses once, address all warnings by manually finding licenses where needed and copy-pasting them into the generated file. CHeck that file into version control and include it your manifest.
  2. Include cargo bundle licenses --output CI-THIRDPARTY --previous THIRDPARTY --check-previous in your CI. This will carry forward any manually changed entries for you, then do a whole file check for sameness, so if a version changed it would fail and force you back to step 1.

Currently this tool supports three formats: yaml, json, and toml. See the above repo for an example yaml THIRDPARTY file.

In the view of conda-forge maintainers, would this satisfy the requirement of licenses and copyrights of the dependencies need to be distributed with the package?

sstadick avatar Sep 19 '21 14:09 sstadick

Looks good! Really anything that moves things forward sounds great to me... I'm wagering if:

  • the proposed tool (and/or cargo-licenses, if not superseded) is packaged (dogfooding itself) through staged-recipes
    • so that we can just add it to requirements/build
    • and/or test/requires, and call it, simply, in build.sh|bld.bat
  • its use is demonstrated on a PR to a "tent pole" package like ripgrep
    • so that we have something to point to on other staged-recipe PRs/a knowledge base text chunk

... I don't see what complaints there would/could be.

From a KISS perspective, and as I don't really want to hand edit this file, I'd see JSON being the preferable serialization format... to that end, now that SPDX 2.2.1 is ISO5962, I'd really hope we start seeing it adopted more broadly (and provided by upstream packagers) and can stop needing to re-implement clever stopgaps.

bollwyvl avatar Sep 20 '21 15:09 bollwyvl

@bollwyvl, thanks for the feedback!

Here is a PR for adding cargo-bundle-licenses to staged-recipes. To be clear, this would supersede cargo-license. The soul purpose of this tool is to satisfy the requirements of conda-forge packaging and make it less onerous to publish rust packages here.

I have two PR's dogfooding it right now: https://github.com/conda-forge/staged-recipes/pull/16110 and https://github.com/conda-forge/staged-recipes/pull/16111, I'll update them to pull in cargo-bundle-licenses via build requirements once / if the cargo-bundle-licenses PR can be merged.

sstadick avatar Sep 20 '21 15:09 sstadick

That's great progress! Good luck! Once again, I'd prioritize the initial staged-recipes PR for the tool itself, and then ensure it meets the needs of at least one known-important, but presently hand-curated, package, as they are the most likely to have been reviewed. Ensuing new packages will then be an easier pitch, as we'll be more confident.

By the by: I can't merge anything, don't really do rust (or go) dev, and am actually super constrained on community time right now anyway, so really I'm just selfishly looking forward to having some tools like this to ease my personal maintenance burden. God- (or -spirit-or-priniciple-or-animus-or-whatever-) speed!

bollwyvl avatar Sep 20 '21 15:09 bollwyvl

@bollwyvl I appreciate the guidance on this!

sstadick avatar Sep 20 '21 15:09 sstadick

Thanks @sstadick! I merged the tool recipe.

pkgw avatar Sep 20 '21 19:09 pkgw

Both https://github.com/conda-forge/staged-recipes/pull/16110 and https://github.com/conda-forge/staged-recipes/pull/16111 are now using the conda-forge cargo-bundle-licenses package to check that all thirdparty licenses are present.

sstadick avatar Sep 21 '21 12:09 sstadick

PR Adding cargo-bundle-licenses to ripgrep-feedstock https://github.com/conda-forge/ripgrep-feedstock/pull/17

sstadick avatar Sep 22 '21 16:09 sstadick

Looks good to me. What happens when cargo-bundle-licenses can't find a license/copyright for a package?

isuruf avatar Sep 22 '21 18:09 isuruf

If run without --check-previous it will just write a warning say it couldn't find the license, and then in the THIRDPARTY.yml file it will put NOT FOUND for the license text, the idea being that a user would then go find it and manually add it so that the next time you run it with --previous it will pull the manually found license forward for you if it still can't find it.

If running with --check-previous, as in the PR's above / in CI generally, if something is still NOT FOUND or different than the --previous license set the tool will exit 1 and fail to get someone's attention. Hopefully this means the THIRDPARTY file will actually stay up to date as deps change instead of making it once and forgetting it.

sstadick avatar Sep 22 '21 18:09 sstadick

Perfect. Thanks for working on this

isuruf avatar Sep 22 '21 18:09 isuruf

If this is good to go I'd love to get these two PR's merged: conda-forge/staged-recipes#16110 and conda-forge/staged-recipes#16111.

I'm sure there will be rough edges with cargo-bundle-licenses, I'm more than happy to resolve issues as they come up / help Rust packages get into conda.

sstadick avatar Sep 22 '21 21:09 sstadick

@sstadick thanks for the great tool! I've also used it in https://github.com/conda-forge/staged-recipes/pull/16252 I can see it being useful in other projects too.

kellpossible avatar Sep 23 '21 17:09 kellpossible

Thanks @sstadick! 😄

It would be great to integrate this strategy into grayskull, which we use to create/update recipes

jakirkham avatar Sep 24 '21 17:09 jakirkham