grpc-java icon indicating copy to clipboard operation
grpc-java copied to clipboard

Tracking Issue for Load Balancing being Experimental

Open carl-mastrangelo opened this issue 8 years ago • 33 comments

Specific usages:

  • LoadBalancer
  • ManagedChannelBuilder.loadBalancerFactory
  • PickFirstLoadBalancerFactory

carl-mastrangelo avatar May 03 '16 19:05 carl-mastrangelo

Let's say I have Consul name resolving policy. For example, name "consul:rpc" is resoled ot [1.1.1.1, 2.2.2.2] but it's dynamic, new ips can arrive.

And each time new service appears, i.e 3.3.3.3 my Discovery service wants to add new connection to this host to grpc connection pool.

I guess it should be done with NameResolver, but it resolves name only once. Then I guess the LoadBalancer can do that, but it receives only the IPs provided by NameResolver. I need the way I can pass "consul:rpc" to LoadBalancer. Any simple way to do that?

Thanks!

odiszapc avatar Dec 06 '16 12:12 odiszapc

Have passed "hostname" (especially consul enpoint name) through affinity CallOption, but not sure is it a good practice.

odiszapc avatar Dec 06 '16 13:12 odiszapc

#2737 should be considered a blocker to stabilizing the LoadBalancer API

ejona86 avatar Feb 17 '17 17:02 ejona86

#2738 should also be considered a blocker.

ejona86 avatar Feb 17 '17 18:02 ejona86

Both blockers are closed now. Maybe current issue is less of experimental now?

neopaf avatar Apr 04 '18 10:04 neopaf

LoadBalancer is stabilizing nicely. We have done multiple fixes and have avoided breaking existing implementations, so I think it is at least close. I do expect some changes still in how load balancers are selected, so that could impact ManagedChannelBuilder.loadBalancerFactory at least partially. (We're currently using a hack there in AutoConfiguredLoadBalancerFactory.java). But LoadBalancer itself seems close.

#4302 (yes... I just created it) would need to be fixed. This is primarily a cross-language decision, but could have fall-out for proxy selection.

ejona86 avatar Apr 04 '18 16:04 ejona86

I would like to raise the possibility of Helper being named something more appropriate before the API is stable.

carl-mastrangelo avatar Apr 04 '18 21:04 carl-mastrangelo

Is this still gated by #4302?

schmohlio avatar Feb 25 '19 16:02 schmohlio

@schmohlio, yes. I will note that there has been a major change to the API (with respect to thread-safety) since I made my comment, but things are stabilizing nicely. Especially since C is adopting a similar approach. We've also recently cleaned up parts of the NameResolver API; I sort of expect the two APIs will become stable at the same time.

ejona86 avatar Feb 25 '19 17:02 ejona86

So we updated the grpc-core dependency to 1.20.0 and changed the code to this:

        ManagedChannelBuilder.forTarget(endpoint)
            .defaultLoadBalancingPolicy("round-robin")
            .usePlaintext()
            .maxInboundMessageSize(GRPC_MAX_INBOUND_MESSAGE_SIZE)
            .build();

Before it was:

    ManagedChannel channel =
        ManagedChannelBuilder.forTarget(endpoint)
            .loadBalancerFactory(
                LoadBalancerRegistry.getDefaultRegistry().getProvider("round-robin"))
            .usePlaintext()
            .maxInboundMessageSize(GRPC_MAX_INBOUND_MESSAGE_SIZE)
            .build();

However, now I get the following error:

java.lang.IllegalStateException: Could not find policy 'round-robin'. Make sure its implementation is either registered to LoadBalancerRegistry or included in META-INF/services/io.grpc.LoadBalancerProvider from your jar files.

Any guidance on how to proceed?

dietervdw-spotify avatar May 17 '19 07:05 dietervdw-spotify

@dietervdw-spotify, use round_robin (underscore), not round-robin (dash).

ejona86 avatar May 20 '19 17:05 ejona86

@ejona86 Haha that's an easy fix :) . Where is this coming from btw, any documentation on this?

dietervdw-spotify avatar May 20 '19 17:05 dietervdw-spotify

@dietervdw-spotify, right now it's mainly things from the service config, since that's the origin of the global registry. There needs to be some place where we list these. I think the main problem is that gRFCs are poor documentation (you have to read them all to understand the current state of the world). It's also in the code...

ejona86 avatar May 20 '19 18:05 ejona86

Why not provide at least an enumeration or some other type of documentation in code? Need to look at the documentation every time someone is going to use this is kind of a waste of time.

enesunal avatar Nov 05 '19 15:11 enesunal

When building a ManagedChannel to talk to some grpc server, is there a way I can set the details of the load balancing policy, health check config, etc., when the server is not publishing a service config? If I understand right, it seems that it is expected that the NameResolver will provide the config when it resolves names to addresses.

I suppose I could use a custom implementation of a NameResolver which populates attributes of my choosing in the ResolutionResult instances that it returns, but it'd be nice to not have to override built-in classes like DnsNameResolver.

In other words I am wondering how I can provide a service config in code when constructing a client for a service (if the infrastructure to provide a service config in DNS/xDS etc is not there), since it is no longer possible to explicitly configure a LoadBalancer instance since https://github.com/grpc/grpc-java/pull/5480.

mattnworb avatar May 12 '20 22:05 mattnworb

@mattnworb, you can use ManagedChannelBuilder.defaultLoadBalancingPolicy() and defaultServiceConfig(). You can set the default lb policy in service config as well, but that is a bigger pain just for that one setting and would be overridden if the NameResolver returned a service config (unless disableServiceConfigLookUp()), even if the NameResolver's config lacked a LB policy.

The only thing that died in #5480 is an easy API to pass a pre-configured instance of an LB policy.

ejona86 avatar May 12 '20 22:05 ejona86

@ejona86 thanks, defaultServiceConfig() is exactly what I needed! 👍

Any thoughts on providing a typed API to avoid having to configure it with JSON-as-Java-objects and to avoid breaking in case the Map key constants ever change?

ManagedChannelBuilder.forTarget(address)
    ...
    .defaultServiceConfig(Map.of(
      "healthCheckConfig", Map.of("serviceName", ""),
      "loadBalancingConfig", List.of(...)
    )); 

mattnworb avatar May 13 '20 01:05 mattnworb

Parts of the config are defined by things out of grpc-api (like LB config), so we can't actually have a fully typed config. The key constants are part of the JSON "API" for service config, so we have to support old names anyway. And many users of the API we'd expect to have a JSON string they parse and dump into the API. So having a typed API would greatly increase the API surface for little gain (mainly misspellings), so it hasn't lasted long when we've considered it.

ejona86 avatar May 13 '20 19:05 ejona86

The only thing that died in #5480 is an easy API to pass a pre-configured instance of an LB policy.

@ejona86 Okay, so what if a pre-configured (custom) LB is exactly what I want to pass along?

mikemccarty-vertex avatar May 27 '20 04:05 mikemccarty-vertex

@mikemccarty-vertex, then we talk about what problem you are trying to solve. There may be other solutions for your problem.

ejona86 avatar May 27 '20 14:05 ejona86

@ejona86 I recently inherited this code so I'm not super confident about being able to answer those questions in detail at this point. I can say it relates to server affinity based on metadata associated with the RPC message being sent. Thanks for your help...

mikemccarty-vertex avatar May 27 '20 14:05 mikemccarty-vertex

@mikemccarty-vertex, hmmm... That makes it a bit hard to help, then. Most configuration can be passed via service config (e.g., via defaultServiceConfig()), as it is data that can be encoded in JSON. The type of data you are talking about sounds like it could be encoded without too much trouble. My earlier comment was about passing instances, as in specific Java objects, to a LB policy. That is more of a problem in certain dependency-injected environments.

If that configuration is not specific to a specific channel, you can create the LB ahead-of-time and register it for multiple channels to use. This is possible even if the data is per-host, as long as all the configuration is available up-front. If it is not available up-front, then making a NameResolver would be a good approach, as long as you can dynamically retrieve the information.

If you really have per-Channel configuration, which would mean you want to contact the same host via two different Channels with different configuration, probably via an injected object, then you could register a separate load balancer instance ('my-lb-RANDOMNUM') for each channel and remove it after the channel is shut down. That's obviously ugly, but is functional. I'm not aware of anyone needing to do that, yet. But maybe you need to since the LB implementation is more of a black box to you.

(Equivalently to the register-multiple-times approach, you can share a global map between channel-creation and LB and add configuration to the map with some key. Then you can pass the key in service config.)

ejona86 avatar May 27 '20 14:05 ejona86

@ejona86 Thanks for the info. AFAICT, I'm looking at your per-Channel configuration scenario so this is all good info. All that said, I got started down this road because I was hoping to upgrade to a newer version of gRPC but it's looking like this is not a simple migration. I think I will table this for now until I have better reasons for tearing up this bit of turf. Thanks again!

mikemccarty-vertex avatar May 27 '20 15:05 mikemccarty-vertex

I am upgrading my application from 1.17.1 to latest (1.35.0). Found that the loadBalancerFactory method has been removed and nameResolverFactory has been deprecated. Do we have similar methods to achieve what has been working in 1.17.1 as per the code below.

this.channel = NettyChannelBuilder.forAddress(props.getHost(), props.getPort())
            .nameResolverFactory(new DnsNameResolverProvider())
            .loadBalancerFactory(new PickFirstLoadBalancerProvider())
            .intercept(new BasicAuthClientHeaderInterceptor(props)).usePlaintext().build();

Thanks!

ramkverma avatar Jan 27 '21 19:01 ramkverma

@ramkverma, DnsNameResolverProvider and PickFirstLoadBalancerProvider are the defaults. Just delete those lines. You can use .loadBalancerFactory("pick_first") and .forTarget("dns:///" + props.getHost() + ":" + props.getPort()) (although that is invalid for IPv6 addresses) to force using the two providers, but again, they are the default.

ejona86 avatar Jan 27 '21 20:01 ejona86

I want to take a note here on the issue we faced and solved.

It would be great if someone can confirm the solution we applied is the right solution because I don't think we enabled Cloud Logging Load Balancer explicitly.

We were following the Google Cloud Logging documentation at here: https://cloud.google.com/logging/docs/reference/libraries

The library worked when logsflush()'d but did not work when logs piped and failed with below errors:

java.lang.IllegalStateException: Could not find policy 'pick_first'. Make sure its implementation is either registered to LoadBalancerRegistry or included in META-INF/services/io.grpc.LoadBalancerProvider from your jar files.
        at io.grpc.internal.AutoConfiguredLoadBalancerFactory$AutoConfiguredLoadBalancer.<init>(AutoConfiguredLoadBalancerFactory.java:92)

And

Jun 28, 2021 4:07:57 PM com.google.common.util.concurrent.AbstractFuture executeListener
SEVERE: RuntimeException while executing runnable CallbackListener{com.google.api.core.ApiFutures$1@6c800b10} with executor MoreExecutors.directExecutor()
java.lang.RuntimeException: com.google.cloud.logging.LoggingException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline exceeded after 49.989643874s.
        at com.google.cloud.logging.LoggingImpl$8.onFailure(LoggingImpl.java:751)

The first pom.xml

        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>google-cloud-logging</artifactId>
            <version>2.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.5</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.5</version>
        </dependency>

Another pom.xml

        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>google-cloud-logging</artifactId>
            <version>2.2.3</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.30</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.30</version>
        </dependency>

This Stackoverflow post made Cloud Logging work. https://stackoverflow.com/questions/55484043/how-to-fix-could-not-find-policy-pick-first-with-google-tts-java-client/67930524#67930524

atas avatar Jun 29 '21 17:06 atas

@AtaS, I would not recommend that approach. https://stackoverflow.com/a/63474092/4690866 is a strictly better approach and there are ways to do it in Maven. But that is off topic for this issue. Please create a new issue and mention me when making it.

ejona86 avatar Jul 08 '21 18:07 ejona86

@ejona86 Is there any documentation to follow? We did this as part of Google Cloud Logging implementation into a Maven based project and the official GCloud Logging docs we followed does not mention anything like this. I feel like we should not come up with ways to do this better and should follow official GCloud way. Confusing situation.

atas avatar Aug 06 '21 10:08 atas

@AtaS, create a separate issue. You are creating a confusing situation on this issue.

ejona86 avatar Aug 06 '21 15:08 ejona86

@ejona86 or others could you provide a status about the experimental state of using load balancing with the Java client? Is it dangerous to use it? Could it lead to the loose of message for example?

jbarotin avatar Jun 03 '22 12:06 jbarotin