Remove `protoc` requirement
Trying to add lance as a dependency currently results in a build error:
error: failed to run custom build command for `lance-encoding v0.18.2`
Caused by:
process didn't exit successfully: `lance-encoding-b1cb34df4c6a6aa9/build-script-build` (exit status: 1)
--- stdout
cargo:rerun-if-changed=protos
--- stderr
Error: Custom { kind: NotFound, error: "Could not find `protoc`. If `protoc` is installed, try setting the `PROTOC` environment variable to the path of the `protoc` binary. To install it on Debian, run `apt-get install protobuf-compiler`. It is also available at https://github.com/protocolbuffers/protobuf/releases For more information: https://docs.rs/prost-build/#sourcing-protoc" }
You should instead check in the generated code, and include it in the crate. The build.rs should do nothing (or ideally, not exist).
error: failed to run custom build command for lance-encoding v0.20.0 (https://github.com/lancedb/lance.git?tag=v0.20.0-beta.2#920b1918)
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.
Caused by:
process didn't exit successfully: /Users/xwg/dev/rust/lancedb/target/debug/build/lance-encoding-b8d4df13f280b9ab/build-script-build (exit status: 1)
--- stdout
cargo:rerun-if-changed=protos
--- stderr
Error: Custom { kind: Other, error: "protoc failed: google/protobuf/empty.proto: File not found.\nencodings.proto:8:1: Import "google/protobuf/empty.proto" was not found or had errors.\nencodings.proto:303:5: "google.protobuf.Empty" is not defined.\n" }
@emilk How to temporarily alleviate this problem
Note an additional wrinkle: Datafusion requires protoc as well. Not just for Substrait but also for Arrow flight and for Datafusion's own internal plan serialization format. It does not appear to be an optional dependency and Datafusion lists protobuf-compiler as a required dependency.
So even if we can checkin the generated code there will still be a need for protoc to build datafusion.
Sounds like we should open an issue on https://github.com/apache/datafusion too then
Sounds like we should open an issue on https://github.com/apache/datafusion too then
Yes, another alternative is modifying datafusion to mask flight and its internal representation behind feature flags. We don't use or need those features (and I don't expect we will anytime soon) and this might be a more palatable change.
(which, to be clear, would also require opening an issue)
Is there any progress on remove protoc?
https://github.com/lancedb/lance/pull/3363 removes the need for an external protoc but does not remove the need for protoc altogether. It adds a new protoc feature. If you enable this feature then you should be able to build in Linux and OSX environments even if there is no externally available protoc.
I will leave this issue open in case anyone wants to work on a complete removal of protoc from the build process. However, given that our dependencies also require protoc, the work should probably start with datafusion and substrait and then work its way up here.
This is still a BIG problem:
- https://github.com/rerun-io/rerun/issues/11581
We cannot depend on the lance crate, because it depends on lance-encoding with the default features enabled, and the default features of lance-encoding requires an up-to-date protoc binary on the system path.
For us at Rerun, it means cargo install rerun-cli is broken for a lot of our users. As a consequence, I'm now working on removing lance support and making it opt-in.
Solutions
Best solution: do NOT depend on protoc, and instead check in the generated code. No need for build.rs. This is what we do with our proto-buf definitions at Rerun, and it works great.
Otherwise: make proto-compilation an OPTIONAL feature at the lance level (i.e. allow me to compile the lance crate without protoc).
And finally: for when protoc is absolutely required, use https://crates.io/crates/protoc-prebuilt to ensure it is installed before calling it.
I have started https://github.com/substrait-io/substrait-rs/issues/411
I just want to note that it looks like datafusion and arrow dropped the requirement on build-time protobuf compilation. See https://github.com/apache/arrow-rs/pull/3927, https://github.com/apache/datafusion/pull/3950, so https://github.com/substrait-io/substrait-rs/issues/411 is the last missing piece mentioned in this thread.