protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Bazel: provide pre-built binary toolchain for protoc

Open alexeagle opened this issue 1 year ago • 20 comments

From https://protobuf.dev/news/2024-10-01/#end-goal:

Once the rules are in the protobuf repo, we intend to address common user requests, such as using prebuilts for the proto compiler where possible.

This is that request.

What language does this apply to? All

Describe the problem you are trying to solve.

All Bazel users are expected to build protoc from source as a cc_binary. This leads to problems which are often reported on the Bazel Slack:

  1. Bazel doesn't include a hermetic C++ toolchain, so the compilation fails for a subset of developers due to the host toolchain on their computer. This can be easily reproduced by registering a non-functional toolchain. Many users have no C++ code, so they have no benefit from dealing with this hermeticity failure other than to repair protoc.
  2. protoc frequently gets recompiled rather than being a cache hit - example report, issue. This makes Bazel builds slow.

Describe the solution you'd like

Bazel's toolchain feature allows it to download the pre-built binaries from protobuf releases

https://github.com/bazelbuild/rules_proto/pull/205 was part of my earlier work to provide this capability. The ruleset mirrors its own integrity hashes as part of each release.

https://github.com/aspect-build/toolchains_protoc/ is a user-land implementation of this proposal, however it was broken by changes in Bazel 8 and rules_proto described by https://protobuf.dev/news/2024-10-01/.

Additional context Some user reports:

  • https://fzakaria.com/2024/11/28/bazel-knowledge-protobuf-is-the-worst-when-it-should-be-the-best.html

I describe some GitHub Actions workflows for automating this pattern on https://blog.aspect.build/releasing-bazel-rulesets-rust (and the earlier https://blog.aspect.build/releasing-bazel-rulesets)

alexeagle avatar Dec 06 '24 23:12 alexeagle

Yes, we have changes pending to provide prebuilt protocs in the open source realm. I don't have the exact timeline tho; currently expecting for H1 2025

shaod2 avatar Dec 09 '24 17:12 shaod2

Having to constantly rebuild protoc from scratch everywhere causes all sorts of weird surprises; e.g. https://github.com/grpc/grpc-java/issues/11790!

It also makes builds more "heavy" than they could otherwise be, which is a problem in certain more "light" (lite?) build environments; note e.g. https://github.com/jitpack/jitpack.io/issues/3129.

It would be really cool if using Protocol Buffers with Bazel would no longer require building protoc.

vorburger avatar Dec 29 '24 21:12 vorburger

In setups with RBE available, this is not a problem since it is mostly just downloaded from the remote cache anyways. In fact, in such setups it is usually preferred to build from source instead of downloading random binaries from the internet for compliance reasons. So building from source should definitely be kept as an option and downloading pre built binaries should be optional.

mering avatar Dec 30 '24 09:12 mering

In setups with RBE available, this is not a problem since it is mostly just downloaded from the remote cache anyways.

Sure, but not every user of Protobuf of Bazel has RBE set-up for every project.

In fact, in such setups it is usually preferred to build from source instead of downloading random binaries from the internet for compliance reasons.

This can of course easily be solved with some sort of checksum / hash that's verified on the download; à la http_archive's sha256 or the HTML's Subresource Integrity (SRI) or that ?hl= "standard" idea for Cryptographic Hyperlinks from draft-sporny-hashlink-07 (what a shame that never gained more widespread traction) et al.

So building from source should definitely be kept as an option and downloading pre built binaries should be optional.

An "opt in" flag (?) would already be a huge progress over the current situation!

vorburger avatar Dec 30 '24 16:12 vorburger

This can of course easily be solved with some sort of checksum / hash that's verified on the download; à la http_archive's sha256 or the HTML's Subresource Integrity (SRI) or that ?hl= "standard" idea for Cryptographic Hyperlinks from draft-sporny-hashlink-07 (what a shame that never gained more widespread traction) et al.

This is only a small part of the story. How do you verify that the binary doesn't contain malicious code? You need reproducible builds first. Then you need to build from source and verify that the binary you download is actually the artifact built from the source you are expecting. You need to do this for every version as inspecting binary diffs is not handy. When you build from source, you only need to check the source diff which is much easier to review. Dependency attacks are a thing. This is how companies get hacked.

mering avatar Dec 31 '24 08:12 mering

@mering yes, supply chain security is an important consideration here, both under Bazel and any other build system.

I think you're pointing out https://github.com/protocolbuffers/protobuf/issues/16165 again - since protoc downloads aren't provided along with a checksum, users have to compute one themselves. Any Bazel rules that fetch should include a checksum - this is true whether they fetch sources and then compile them, or a binary that's compiled by someone else. https://github.com/aspect-build/toolchains_protoc/blob/main/protoc/private/versions.bzl#L54 for example.

Then yes, it would be nice to have a proof of provenance, via some attestation published on protoc releases. We are adding these right now to modules on the BCR. It will be trivial for tools like tar where we use GitHub Actions and they have a feature for this.

setups with RBE available, this is not a problem

This isn't true for most Bazel users, since they ship Macs to their developers and the remote cache contains Linux binaries only.

alexeagle avatar Jan 04 '25 15:01 alexeagle

@mering yes, supply chain security is an important consideration here, both under Bazel and any other build system.

I think you're pointing out #16165 again - since protoc downloads aren't provided along with a checksum, users have to compute one themselves. Any Bazel rules that fetch should include a checksum - this is true whether they fetch sources and then compile them, or a binary that's compiled by someone else. https://github.com/aspect-build/toolchains_protoc/blob/main/protoc/private/versions.bzl#L54 for example.

Then yes, it would be nice to have a proof of provenance, via some attestation published on protoc releases. We are adding these right now to modules on the BCR. It will be trivial for tools like tar where we use GitHub Actions and they have a feature for this.

This requires trusting whoever is specifying the checksums. If you don't want to (or are not allowed to) blindly trust someone providing correct checksums but reviewing the code yourself, this is much easier when you build source code instead of comparing binaries (which usually also requires building the code in the first place).

setups with RBE available, this is not a problem

This isn't true for most Bazel users, since they ship Macs to their developers and the remote cache contains Linux binaries only.

Why are they not just using Linux for development in the first place if this is what they test and ship with their CI/CD? It's usually a bad idea to test something different from what you ship...

mering avatar Jan 07 '25 08:01 mering

Why are they not just using Linux for development in the first place if this is what they test and ship with their CI/CD? It's usually a bad idea to test something different from what you ship...

Most of the time before bazel is even introduced to the environment you have existing machines and at least for the last 10 years of my career that has been macs. I personally would like a linux machine the same as my deploy environment however it is just not the reality for most workplaces.

michaelschuett-tomtom avatar Jan 07 '25 19:01 michaelschuett-tomtom

Why are they not just using Linux for development in the first place if this is what they test and ship with their CI/CD? It's usually a bad idea to test something different from what you ship...

Most of the time before bazel is even introduced to the environment you have existing machines and at least for the last 10 years of my career that has been macs. I personally would like a linux machine the same as my deploy environment however it is just not the reality for most workplaces.

Interesting. In the various companies I worked across different industries, I have seen more Linux machines than Macs in the past 15 years (mostly without using Bazel).

mering avatar Jan 07 '25 19:01 mering

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

github-actions[bot] avatar Apr 08 '25 10:04 github-actions[bot]

Still active.

jschaf avatar Apr 08 '25 18:04 jschaf

This has been prioritized. We will be working with @alexeagle and Aspect.build to complete this by EOY 2025.

jguamie avatar Jun 11 '25 17:06 jguamie

@jguamie @alexeagle will there continue to be an option (flag/transition) to build protoc from source? We need to keep using the current setup.

mering avatar Jun 11 '25 17:06 mering

@mering yes, this won't impact existing setups.

jguamie avatar Jun 11 '25 23:06 jguamie

This has been prioritized. We will be working with @alexeagle and Aspect.build to complete this by EOY 2025.

Thanks for prioritizing this! Let's make sure we add the support for multiple protoc versions, like what rules_python did. Here is the use case I discussed with @alexeagle

gfrankliu avatar Jul 04 '25 19:07 gfrankliu

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

github-actions[bot] avatar Oct 03 '25 10:10 github-actions[bot]

.

aaliddell avatar Oct 03 '25 10:10 aaliddell

Any progress on this? It would be nicer if bazel could just execute installed protoc version. Since bazel is not hermetic, as it depends on the installed C++ tool chain, why not depend on installed protoc directly? Are is it possible to create a genrule that can use the installed protoc version? For those that want hermatic builds, they should anyway use external build service or a dedicated host for their builds anyway.

jayaprabhakar avatar Oct 21 '25 20:10 jayaprabhakar

Yes there is progress, the most recent protobuf release includes Feat: update bazel central registry publish workflow (https://github.com/protocolbuffers/protobuf/commit/d5217fd016809b3436c30a780216ebd34f4bed2d)

which adds a prebuilt_tool_integrity.bzl file inside the distribution. Next I need to change the toolchain registrations to use it.

alexeagle avatar Oct 21 '25 23:10 alexeagle