lance icon indicating copy to clipboard operation
lance copied to clipboard

Remove `protoc` requirement

Open emilk opened this issue 1 year ago • 7 comments

Trying to add lance as a dependency currently results in a build error:

error: failed to run custom build command for `lance-encoding v0.18.2`
Caused by:
  process didn't exit successfully: `lance-encoding-b1cb34df4c6a6aa9/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-changed=protos

  --- stderr
  Error: Custom { kind: NotFound, error: "Could not find `protoc`. If `protoc` is installed, try setting the `PROTOC` environment variable to the path of the `protoc` binary. To install it on Debian, run `apt-get install protobuf-compiler`. It is also available at https://github.com/protocolbuffers/protobuf/releases  For more information: https://docs.rs/prost-build/#sourcing-protoc" }

You should instead check in the generated code, and include it in the crate. The build.rs should do nothing (or ideally, not exist).

emilk avatar Oct 31 '24 16:10 emilk

error: failed to run custom build command for lance-encoding v0.20.0 (https://github.com/lancedb/lance.git?tag=v0.20.0-beta.2#920b1918) note: To improve backtraces for build dependencies, set the CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation. Caused by: process didn't exit successfully: /Users/xwg/dev/rust/lancedb/target/debug/build/lance-encoding-b8d4df13f280b9ab/build-script-build (exit status: 1) --- stdout cargo:rerun-if-changed=protos --- stderr Error: Custom { kind: Other, error: "protoc failed: google/protobuf/empty.proto: File not found.\nencodings.proto:8:1: Import "google/protobuf/empty.proto" was not found or had errors.\nencodings.proto:303:5: "google.protobuf.Empty" is not defined.\n" }

loloxwg avatar Nov 22 '24 09:11 loloxwg

@emilk How to temporarily alleviate this problem

loloxwg avatar Nov 25 '24 01:11 loloxwg

Note an additional wrinkle: Datafusion requires protoc as well. Not just for Substrait but also for Arrow flight and for Datafusion's own internal plan serialization format. It does not appear to be an optional dependency and Datafusion lists protobuf-compiler as a required dependency.

So even if we can checkin the generated code there will still be a need for protoc to build datafusion.

westonpace avatar Dec 04 '24 22:12 westonpace

Sounds like we should open an issue on https://github.com/apache/datafusion too then

emilk avatar Dec 05 '24 11:12 emilk

Sounds like we should open an issue on https://github.com/apache/datafusion too then

Yes, another alternative is modifying datafusion to mask flight and its internal representation behind feature flags. We don't use or need those features (and I don't expect we will anytime soon) and this might be a more palatable change.

(which, to be clear, would also require opening an issue)

westonpace avatar Dec 05 '24 17:12 westonpace

Is there any progress on remove protoc?

immno avatar Jan 07 '25 01:01 immno

https://github.com/lancedb/lance/pull/3363 removes the need for an external protoc but does not remove the need for protoc altogether. It adds a new protoc feature. If you enable this feature then you should be able to build in Linux and OSX environments even if there is no externally available protoc.

I will leave this issue open in case anyone wants to work on a complete removal of protoc from the build process. However, given that our dependencies also require protoc, the work should probably start with datafusion and substrait and then work its way up here.

westonpace avatar Jan 10 '25 14:01 westonpace

This is still a BIG problem:

  • https://github.com/rerun-io/rerun/issues/11581

We cannot depend on the lance crate, because it depends on lance-encoding with the default features enabled, and the default features of lance-encoding requires an up-to-date protoc binary on the system path.

For us at Rerun, it means cargo install rerun-cli is broken for a lot of our users. As a consequence, I'm now working on removing lance support and making it opt-in.

Solutions

Best solution: do NOT depend on protoc, and instead check in the generated code. No need for build.rs. This is what we do with our proto-buf definitions at Rerun, and it works great.

Otherwise: make proto-compilation an OPTIONAL feature at the lance level (i.e. allow me to compile the lance crate without protoc).

And finally: for when protoc is absolutely required, use https://crates.io/crates/protoc-prebuilt to ensure it is installed before calling it.

emilk avatar Oct 20 '25 09:10 emilk

I have started https://github.com/substrait-io/substrait-rs/issues/411

Xuanwo avatar Oct 20 '25 11:10 Xuanwo

I just want to note that it looks like datafusion and arrow dropped the requirement on build-time protobuf compilation. See https://github.com/apache/arrow-rs/pull/3927, https://github.com/apache/datafusion/pull/3950, so https://github.com/substrait-io/substrait-rs/issues/411 is the last missing piece mentioned in this thread.

valkum avatar Oct 23 '25 15:10 valkum