bazel icon indicating copy to clipboard operation
bazel copied to clipboard

include grpc-xds library

Open tmhdgsn opened this issue 1 year ago • 75 comments

to enable support of the xds protocol over grpc for remote builds/caching

tmhdgsn avatar Oct 09 '24 10:10 tmhdgsn

@tmhdgsn, Could you please take a look at the failing checks?

satyanandak avatar Oct 11 '24 06:10 satyanandak

Hey @satyanandak took a look, lots of unknown host exceptions for github.com and storage.googleapis.com

Can it be re-run?

tmhdgsn avatar Oct 11 '24 12:10 tmhdgsn

@satyanandak @sgowroji sorry I can see the checks got re-rerun but the test logs still show host exceptions for github and storage.googleapis.com

Can someone take a look ? Thanks

tmhdgsn avatar Oct 18 '24 14:10 tmhdgsn

Why do you need this in Bazel itself?

Wyverald avatar Oct 22 '24 20:10 Wyverald

We would like this so that we can use xds to manage our traffic and auth policies for all of bazel’s communication. Without this we have to mange a bunch of proxying infrastructure across all developer hosts. This seemed a reasonable ask as it is grpc-native integration, supported by all the main language implementations.

JonathanPerry651 avatar Oct 23 '24 08:10 JonathanPerry651

@Wyverald - does this seem like a reasonable ask? Thanks!

JonathanPerry651 avatar Oct 28 '24 10:10 JonathanPerry651

Could you help me understand how this PR by itself allows you to, well, do anything differently? Are there future PRs planned? Or are you running a patched version of Bazel yourself? Or, just by including the xds library when building Bazel itself, are you now able to specify some xds-specific flags/protocols when using Bazel? (Sorry for the confusion -- I know very little about grpc/xds)

Wyverald avatar Oct 31 '24 21:10 Wyverald

just by including the xds library when building Bazel means we can give it an "xds:///" address for remote builds/caching and it understands how to resolve it

tmhdgsn avatar Oct 31 '24 22:10 tmhdgsn

I see, that answers the question. Please rebase the PR and address the check failures, then we can import.

Wyverald avatar Oct 31 '24 22:10 Wyverald

sure will rebase - can we get some help with the check failures? It looks like its failing with Unknown host exceptions for github.com and storage.googleapis.com

tmhdgsn avatar Oct 31 '24 22:10 tmhdgsn

ah. it's because we need to build the distribution archive that's used to bootstrap a Bazel build, and the bootstrapping process must not access the internet.

Looks like you need to add a line "grpc-java++grpc_java_repositories_extension+com_github_cncf_xds", to https://github.com/bazelbuild/bazel/blob/20f9a5e8258dce0d30882a3a2fdb4219596fb1ae/repositories.bzl#L64

Wyverald avatar Oct 31 '24 22:10 Wyverald

hmm. the failure on arm64 macOS looks legit. This adds ~100MB to a ~380MB archive, which is a rather big increase (~26%).

This test doesn't fail on other platforms because apparently arm64 macOS is the "bottleneck"; the install base is only ~200MB on Linux, and this PR adds ~50MB (similar ratio of increase). I wonder if the arm64 macOS binary is big because of Rosetta?

Digging through history, the install base size has been as high as 460MB before, so 480MB isn't unthinkable. But cc-ing @meteorcloudy to double-check. (Also, if anyone has ideas how to reduce this size increase, please share.)

Wyverald avatar Nov 05 '24 03:11 Wyverald

This PR will increase the Bazel binary size from 56M to 67M on macOS arm64 platform, which is quite a huge bump.

~~But I do notice we probably have duplicated protobuf dependencies in the Bazel binary already after https://github.com/bazelbuild/bazel/pull/23767, let me check how much we could save after deduplicating.~~

This is not true, rules_jvm_external somehow redirects @maven//:com_google_protobuf_protobuf_java to @@protobuf+//:protobuf_java.

meteorcloudy avatar Nov 05 '24 10:11 meteorcloudy

This is the diff of files included in //src/main/java/com/google/devtools/build/lib/bazel:BazelServer_deploy.jar https://gist.github.com/meteorcloudy/53bccf216c66e49427615a3e2b824644

/cc @meisterT @lberki How do you feel about bumping the Bazel binary size to support the xds protocol?

meteorcloudy avatar Nov 05 '24 10:11 meteorcloudy

/cc @comius @tjgq since this is remote caching/execution related.

meteorcloudy avatar Nov 05 '24 11:11 meteorcloudy

Many of the new files look unused. Could we add an integration test to verify that this new feature actually works and then rely on proguard to strip out unneeded deps similar to what we do for fastutil?

If that doesn't work, could we provide a way for users to supply their own jar files with grpc plugins for additional connection schemes?

fmeum avatar Nov 05 '24 11:11 fmeum

And if this feature only relies on the correct jars being available during runtime, can users just inject them using --host_jvm_args at runtime, similar to https://bazel.build/docs/user-manual#memory-tracking

meteorcloudy avatar Nov 05 '24 12:11 meteorcloudy

Do I understand correctly that this increase the size of the Bazel binary only by a single-digit number of megabytes, but the install base by ~50MB?

If so, where does the increase come from? I assume it's mostly Java classes, but then what does all that code do?

lberki avatar Nov 05 '24 12:11 lberki

Do I understand correctly that this increase the size of the Bazel binary only by a single-digit number of megabytes, but the install base by ~50MB?

Bazel binary increases is 11MB not exactly single digit, the install base is much bigger because it's decompressed from the Bazel binary.

meteorcloudy avatar Nov 05 '24 13:11 meteorcloudy

the install base is much bigger because it's decompressed from the Bazel binary.

I get that. What I'm asking is what does this 50MB of code do. It seems to be a lot for what it apparently does.

lberki avatar Nov 05 '24 13:11 lberki

Yeah, maybe we can use proguard to largely trim it down.

meteorcloudy avatar Nov 05 '24 15:11 meteorcloudy

Much of the code is generated protobufs defined by Envoy. For the binary uploaded to Maven Central we cherry-pick individual files and it is 10 MB. Testing right now with a deploy.jar and the deps I see here, I see an 18 MB increase by adding grpc-xds. I noticed 4 MB of that was old/unused: https://github.com/grpc/grpc-java/pull/11667 .

Although there's a few cases of extra files brought in with Bazel, I think most of the remaining 4 MB increase compared to Maven Central is not real, simply because it comes from transitive deps. Of those transitive deps, the only other potentially-unnecessary-given-the-current-architecture dependency I'm seeing powers google_default credentials; that's bringing in 3 MB of deps. We'd have to use reflection to avoid that.

ejona86 avatar Nov 05 '24 15:11 ejona86

This is not true, rules_jvm_external somehow redirects @maven//:com_google_protobuf_protobuf_java to @@protobuf+//:protobuf_java.

That would be because of grpc-java adds maven.overrides. Any mixing of java_proto_library with maven_install needs them.

ejona86 avatar Nov 05 '24 16:11 ejona86

Thank you all for looking at this ; re google_default_credentials, we would very much like that to also ‘just work’, so if we could leave that 3mb in that would be great. (Unless I’m misunderstanding the implication of ‘potentially-unnecessary-given-the-current-architecture’ of course!)

JonathanPerry651 avatar Nov 05 '24 17:11 JonathanPerry651

Yeah, to use google_default creds you'd need those deps. The potentially-unnecessary-given-the-current-architecture was just that we already have the plumbing to make that more optional. Some other things could theoretically be more optional, but would take a more concerted effort.

ejona86 avatar Nov 05 '24 18:11 ejona86

So what are our next steps here exactly please? Are we comfortable with the size bump? @ejona86 there was an ask for a test, which seems reasonable. What do you think would be a reasonable sanity check?

Thank you all again!

JonathanPerry651 avatar Nov 07 '24 06:11 JonathanPerry651

A 20% size bump is huge, so we should consider other options first, one of which might be what Eric suggests.

Alternatively, did you try out what Yun suggested above?

meisterT avatar Nov 07 '24 07:11 meisterT

I'm afraid I'm not very up-to-date with gRPC and therefore I don't know what the official one-line description "xDS is the protocol initially used by Envoy, that is evolving into a universal data plan API for service mesh." means.

Given the large size increase, the benefits must be commensurate. So I'd either need to be convinced that this is something very, very good, the size penalty would need to be reduced or we'd need to find a way to make this code optional.

Bazel@HEAD currently clocks in at 98MB compressed, 205MB uncompressed according to my rudimentary measurements (bazel build //src:bazel; cd $(bazel-bin/src/bazel info install_base); du -sh .).

@ejona86 how would these numbers change in the best version and what are those bytes spent on? If I understand correctly that it's +10MB/+50MB, I'd much rather not merge this unless the benefits are truly fantastic.

lberki avatar Nov 07 '24 07:11 lberki

Bazel@HEAD currently clocks in at 98MB compressed

I believe this is without -c opt, it's much smaller with this option. (56M for me on macOS arm64)

meteorcloudy avatar Nov 07 '24 12:11 meteorcloudy

Indeed, I didn't realize -c opt also compresses the binary more. 62MB for me on Linux instead of 98MB (the 205MB number is unchanged, though)

lberki avatar Nov 07 '24 13:11 lberki