grpc-java
grpc-java copied to clipboard
Channel panic if receives CDS update with RING_HASH lb policy
Since v1.37.0, the client channel would panic if receives CDS update with RING_HASH lb policy.
SEVERE: [Channel<1>: (xds:///wallet.grpcwallet.io)] Uncaught exception in the SynchronizationContext. Panic!
java.lang.NullPointerException: provider
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:910)
at io.grpc.internal.ServiceConfigUtil$PolicySelection.<init>(ServiceConfigUtil.java:422)
at io.grpc.xds.CdsLoadBalancer2$CdsLbState.handleClusterDiscovered(CdsLoadBalancer2.java:195)
at io.grpc.xds.CdsLoadBalancer2$CdsLbState.access$1900(CdsLoadBalancer2.java:120)
at io.grpc.xds.CdsLoadBalancer2$CdsLbState$ClusterState$1ClusterDiscovered.run(CdsLoadBalancer2.java:316)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
at io.grpc.xds.CdsLoadBalancer2$CdsLbState$ClusterState.onChanged(CdsLoadBalancer2.java:320)
at io.grpc.xds.ClientXdsClient$ResourceSubscriber.notifyWatcher(ClientXdsClient.java:2223)
at io.grpc.xds.ClientXdsClient$ResourceSubscriber.onData(ClientXdsClient.java:2179)
at io.grpc.xds.ClientXdsClient.handleResourcesAccepted(ClientXdsClient.java:2043)
at io.grpc.xds.ClientXdsClient.handleCdsResponse(ClientXdsClient.java:1417)
at io.grpc.xds.AbstractXdsClient$AbstractAdsStream.handleRpcResponse(AbstractXdsClient.java:500)
at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1$1.run(AbstractXdsClient.java:663)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1.onNext(AbstractXdsClient.java:655)
at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1.onNext(AbstractXdsClient.java:652)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:465)
at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
at io.grpc.internal.DelayedClientCall$DelayedListener.onMessage(DelayedClientCall.java:447)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.j
ava:656)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.
java:641)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
-
The
GRPC_XDS_EXPERIMENTAL_ENABLE_RING_HASHenv flag was broken. If the flag is unset or false, the LbRegistry will return a null LbProvider, causing the channel panic with NPE -
The right behavior should be still NACK the CDS update if RING_HASH is not supported. But the NACK logic was removed since v1.37.0.
The panic issue is fixed by #8438 in master and backported in v1.40.x, v1.39.x, v1.38.x, and fixed by #8440 in v1.37.x. The fix was fallback to round_robin if RING_HASH is not supported. However, we should still fix the behavior by NACKing the response.