grpc-java icon indicating copy to clipboard operation
grpc-java copied to clipboard

Channel panic if receives CDS update with RING_HASH lb policy

Open dapengzhang0 opened this issue 4 years ago • 0 comments

Since v1.37.0, the client channel would panic if receives CDS update with RING_HASH lb policy.

SEVERE: [Channel<1>: (xds:///wallet.grpcwallet.io)] Uncaught exception in the SynchronizationContext. Panic!
java.lang.NullPointerException: provider
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:910)
        at io.grpc.internal.ServiceConfigUtil$PolicySelection.<init>(ServiceConfigUtil.java:422)
        at io.grpc.xds.CdsLoadBalancer2$CdsLbState.handleClusterDiscovered(CdsLoadBalancer2.java:195)
        at io.grpc.xds.CdsLoadBalancer2$CdsLbState.access$1900(CdsLoadBalancer2.java:120)
        at io.grpc.xds.CdsLoadBalancer2$CdsLbState$ClusterState$1ClusterDiscovered.run(CdsLoadBalancer2.java:316)
        at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
        at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
        at io.grpc.xds.CdsLoadBalancer2$CdsLbState$ClusterState.onChanged(CdsLoadBalancer2.java:320)
        at io.grpc.xds.ClientXdsClient$ResourceSubscriber.notifyWatcher(ClientXdsClient.java:2223)
        at io.grpc.xds.ClientXdsClient$ResourceSubscriber.onData(ClientXdsClient.java:2179)
        at io.grpc.xds.ClientXdsClient.handleResourcesAccepted(ClientXdsClient.java:2043)
        at io.grpc.xds.ClientXdsClient.handleCdsResponse(ClientXdsClient.java:1417)
        at io.grpc.xds.AbstractXdsClient$AbstractAdsStream.handleRpcResponse(AbstractXdsClient.java:500)
        at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1$1.run(AbstractXdsClient.java:663)
        at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
        at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
        at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1.onNext(AbstractXdsClient.java:655)
        at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1.onNext(AbstractXdsClient.java:652)
        at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:465)
        at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
        at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
        at io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:447)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.j
ava:656)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.
java:641)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
  • The GRPC_XDS_EXPERIMENTAL_ENABLE_RING_HASH env flag was broken. If the flag is unset or false, the LbRegistry will return a null LbProvider, causing the channel panic with NPE

  • The right behavior should be still NACK the CDS update if RING_HASH is not supported. But the NACK logic was removed since v1.37.0.

The panic issue is fixed by #8438 in master and backported in v1.40.x, v1.39.x, v1.38.x, and fixed by #8440 in v1.37.x. The fix was fallback to round_robin if RING_HASH is not supported. However, we should still fix the behavior by NACKing the response.

dapengzhang0 avatar Aug 25 '21 23:08 dapengzhang0