tonic icon indicating copy to clipboard operation
tonic copied to clipboard

License correction and on-the-fly codegen

Open LecrisUT opened this issue 1 year ago • 12 comments

Motivation

This PR tries to address licensing issue in the Cargo.toml metadata. Both the source files and the generated files contain Apache-2.0 licensed files, and as such the Cargo.toml file should reflect that. This is not a issue because prost is already licensed as Apache-2.0 and as such any dependency would have to comply with MIT AND Apache-2.0 implicitly.

A more nuanced file are the .bin files which further includes BSD-3-Clause metadata. To me it is unclear what the license under Cargo.toml is meant to reflect and if it would include the generated artifacts at the end as well, and it is also ambiguous who should be carrying the additional BSD-3-Clause license information (tonic, prost or end-user?). At the very least including the .bin files in VCS and crates is problematic.

Solution

  • Add the Apache-2.0 license file and metadata
  • Migrate the codegen to build.rs files

Unfortunately I am quite a novice on writing rust code, so there is quite a duplication of the build.rs files, but I hope you can alter it to be more appropriate.

Closes #1939

PS: the Apache-2.0 copyright should be updated to include tonic as the copyright holder, unless it is not meant to be altered from upstream.

LecrisUT avatar Sep 16 '24 08:09 LecrisUT

This PR tries to address licensing issue in the Cargo.toml metadata. Both the source files and the generated files contain Apache-2.0 licensed files,

How did you come to this conclusion? Also it's not clear to me why the Apache-2.0 licensing needs to affect all the crates that you've added here -- presumably that's not just because they depend on another crate that has Apache-2.0-licensed contents?

I think we should keep the generated files versioned -- build scripts impose a slowdown/extra compile time on all downstream users, and in this case all the inputs are known at development time anyway.

djc avatar Sep 16 '24 08:09 djc

This PR tries to address licensing issue in the Cargo.toml metadata. Both the source files and the generated files contain Apache-2.0 licensed files,

How did you come to this conclusion? Also it's not clear to me why the Apache-2.0 licensing needs to affect all the crates that you've added here -- presumably that's not just because they depend on another crate that has Apache-2.0-licensed contents?

Because the source files themselves have a Apache-2.0 copyright notice: https://github.com/hyperium/tonic/blob/626b094754cc52973adbfc8d2b9790701d54ef84/tonic-types/proto/status.proto#L1-L13

And we have investigated the .bin files where we found both BSD-3-Clause license metadata and 2 Apache-2.0 license metadata that seem to be from the .proto files.

I think we should keep the generated files versioned -- build scripts impose a slowdown/extra compile time on all downstream users, and in this case all the inputs are known at development time anyway.

At the same time, when I run the original codegen with tonic 0.12.2 I have unstaged diffs compared, so there are potential differences that people can encounter. If you would like to have it under VCS as a cached version, that could also work as long as the build script is available in the crate. But then you might need to review the .bin files if newly licensed files are included or if non-free files are being used. When Fedora regenerates the code it is on our duty to do this review instead.

LecrisUT avatar Sep 16 '24 09:09 LecrisUT

At the same time, when I run the original codegen with tonic 0.12.2 I have unstaged diffs compared, so there are potential differences that people can encounter. If you would like to have it under VCS as a cached version, that could also work as long as the build script is available in the crate. But then you might need to review the .bin files if newly licensed files are included or if non-free files are being used. When Fedora regenerates the code it is on our duty to do this review instead.

I think the unstaged diffs happen because you have are using a different version of prost. If you want to make changes to include the codegen code in the crate, that seems okay.

djc avatar Sep 16 '24 09:09 djc

I think the unstaged diffs happen because you have are using a different version of prost

It seems more related to protoc version and the libraries it picks up from there

$ protoc --version
libprotoc 3.19.6

When I run codegen against master the same unstaged diffs appear and checking the CI, prost versions are identical. To me this seems dubious, but I am not familiar with these projects to know if using artifacts from other protoc is supported or not.

LecrisUT avatar Sep 16 '24 09:09 LecrisUT

This PR tries to address licensing issue in the Cargo.toml metadata. Both the source files and the generated files contain Apache-2.0 licensed files,

How did you come to this conclusion? Also it's not clear to me why the Apache-2.0 licensing needs to affect all the crates that you've added here -- presumably that's not just because they depend on another crate that has Apache-2.0-licensed contents?

The tonic-health, tonic-reflection, and tonic-types crates all contain .proto files that are explicitly Apache-2.0 licensed, but they currently don't contain the Apache-2.0 license text (which is a requirement to satisfy the terms of this license), and don't mention the Apache-2.0 license in their crate metadata.

The fact that this change was applied to tonic-interop too is probably a mistake.

I think we should keep the generated files versioned -- build scripts impose a slowdown/extra compile time on all downstream users, and in this case all the inputs are known at development time anyway.

The problem is that we cannot ship binary blobs like this in packages in Fedora Linux, especially if they're not reproducible - both because it violates rules we have for packaging binary artifacts, and because it's a possible xz-utils-like security risk.

It would be OK for us if the regeneration of the *.bin files was introduced behind and off-by-default feature flag, but I don't think build.rs can be made entirely optional, depending on a feature flag. If this is not something that you want to do in tonic upstream, then we'll have to carry this as a downstream patch in Fedora.

decathorpe avatar Sep 16 '24 11:09 decathorpe

The fact that this change was applied to tonic-interop too is probably a mistake.

interop, examples and codegen are non-public, so it doesn't matter much. Thought I would just be completionist since it does have .proto files with Apache-2.0 header in the test files. Similarly with tonic crate and benches-disabled/proto/helloworld/helloworld.proto. I am still confused on what license is supposed to represent, source or artifact? Would be nice if they had the new Fedora distinction of SourceLicense vs License.

If this is not something that you want to do in tonic upstream, then we'll have to carry this as a downstream patch in Fedora.

Would we have to "sanitize" the crate before uploading in this case, or would deleting in the spec file be sufficient?

LecrisUT avatar Sep 16 '24 12:09 LecrisUT

If this is not something that you want to do in tonic upstream, then we'll have to carry this as a downstream patch in Fedora.

Would we have to "sanitize" the crate before uploading in this case, or would deleting in the spec file be sufficient?

Doing deletion of pre-compiled artifacts is OK to be done in packaging, no need to make "cleaned" sources out-of-band.

decathorpe avatar Sep 16 '24 13:09 decathorpe

Hmm, one design I've seen in blosc2-sys is to use features e.g.:

    #[cfg(feature = "regenerate-bindings")]

Maybe something similar is ok here?

LecrisUT avatar Sep 17 '24 08:09 LecrisUT

If the approach with building the .bin files in build.rs is not an acceptable solution for you, can we at least get the license corrected to include AND Apache-2.0 and include the Apache-2.0 license texts?

I think that part should be uncontroversial, given that these crates obviously do ship files that are covered by the Apache-2.0 license.

decathorpe avatar Sep 19 '24 10:09 decathorpe

If the approach with building the .bin files in build.rs is not an acceptable solution for you, can we at least get the license corrected to include AND Apache-2.0 and include the Apache-2.0 license texts?

I think that part should be uncontroversial, given that these crates obviously do ship files that are covered by the Apache-2.0 license.

Sure, happy to take a PR for that. We could probably also figure out a way to make the pre-generated code optional? Not sure how it would work with optional binary artifacts.

djc avatar Sep 19 '24 11:09 djc

If the approach with building the .bin files in build.rs is not an acceptable solution for you, can we at least get the license corrected to include AND Apache-2.0 and include the Apache-2.0 license texts? I think that part should be uncontroversial, given that these crates obviously do ship files that are covered by the Apache-2.0 license.

Sure, happy to take a PR for that.

That's why I had it as separate commits ;). Cherry-picked it out in https://github.com/hyperium/tonic/pull/1946

We could probably also figure out a way to make the pre-generated code optional? Not sure how it would work with optional binary artifacts.

What do you think about: https://github.com/milesgranger/blosc2-rs/blob/97a31e5a3b6161bae0468558913f346399999de8/blosc2-sys/build.rs#L132-L133

If the logic is inversed, as #[cfg(not(feature = "pre-built-binaries"))] than that would be even greater for us. In either case the artifacts would be written in the same location.

LecrisUT avatar Sep 19 '24 11:09 LecrisUT

What do you think about: https://github.com/milesgranger/blosc2-rs/blob/97a31e5a3b6161bae0468558913f346399999de8/blosc2-sys/build.rs#L132-L133

This doesn't help because downstream still needs to block builds on build.rs compilation.

djc avatar Sep 19 '24 12:09 djc

Closing this for now, if you want to pick the work back up you can re-open.

LucioFranco avatar Jun 20 '25 15:06 LucioFranco