lettuce icon indicating copy to clipboard operation
lettuce copied to clipboard

Periodic cluster topology refresh blocks commands execution.

Open ashok-shukla opened this issue 1 year ago • 5 comments

Bug Report

Current Behavior

Periodic cluster topology refresh blocks commands execution.

From JFR,

  1. topology refresh took 1.5 sec Start Time Duration End Time Event Thread sources 7/5/24, 5:49:06.714 PM 1.500 s 7/5/24, 5:49:08.215 PM lettuce-nioEventLoop-5-2 xxxx,xxxx

  2. Thread is parked to execute get command during the topology refresh period

Start Time	Duration	End Time	Event Thread	Address of Object Parked	Class Parked On	Park Timeout	Park Until
7/5/24, 5:49:06.734 PM	1.380 s	7/5/24, 5:49:08.115 PM	io-executor-thread-9	0xE9789188	java.util.concurrent.CompletableFuture$Signaller	60.000 s	12/2/55, 4:47:04.192 PM

get command stack:
  Stack Trace	Count	Percentage
  void jdk.internal.misc.Unsafe.park(boolean, long):-1	1	100 %
  void java.util.concurrent.locks.LockSupport.parkNanos(Object, long):269	1	100 %
  boolean java.util.concurrent.CompletableFuture$Signaller.block():1866	1	100 %
  void java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool$ManagedBlocker):3780	1	100 %
  void java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool$ManagedBlocker):3725	1	100 %
  Object java.util.concurrent.CompletableFuture.timedGet(long):1939	1	100 %
  Object java.util.concurrent.CompletableFuture.get(long, TimeUnit):2095	1	100 %
  boolean io.lettuce.core.protocol.AsyncCommand.await(long, TimeUnit):83	1	100 %
  Object io.lettuce.core.internal.Futures.awaitOrCancel(RedisFuture, long, TimeUnit):244	1	100 %
  Object io.lettuce.core.cluster.ClusterFutureSyncInvocationHandler.handleInvocation(Object, Method, Object[]):131	1	100 %
  Object io.lettuce.core.internal.AbstractInvocationHandler.invoke(Object, Method, Object[]):80	1	100 %
  Object jdk.proxy2.$Proxy119.get(Object):-1	1	100 %
  byte[] oracle.oic.v3.mtms.infra.cache.impl.AbstractCacheImpl.lambda$get$8(byte[], String):280	1	100 %
  .....
  .....

Expected behavior/code

As for reference guide, from Lettuce 6.0, Cluster topology refresh is now non-blocking. But from JFR, it shows as blocking.

Environment

  • Lettuce version(s): 6.2.6.RELEASE
  • Redis version: 6.2.6

Possible Solution

Additional context

We are executing commands using synchronous api. Our redis cluster consists of 9 nodes: 3 master and each master has 2 replicas.

ashok-shukla avatar Jul 08 '24 14:07 ashok-shukla

Hey @ashok-shukla ,

what lead you to conclude that the parked thread is handling topology refresh? Is there some minimum reproducible example of the problem we can take a look at?

tishun avatar Jul 09 '24 11:07 tishun

(Asynchronous topology refresh is handled in #1107)

tishun avatar Jul 09 '24 11:07 tishun

parked thread is not handling topology refresh. but waiting for topology refresh.

io-executor-thread-9: is parked from 7/5/24, 5:49:06.734 PM to 7/5/24, 5:49:08.115 PM. lettuce-nioEventLoop-5-2: doing topology refresh from 7/5/24, 5:49:06.714 PM to 7/5/24, 5:49:08.215 PM.

I do not have a minimum reproducible example now. if possible, I will try to create one.

ashok-shukla avatar Jul 09 '24 14:07 ashok-shukla

reproducer: cache.zip

  1. download cache.zip and unzip
  2. update REDIS_HOST and REDIS_PORT in CheckTopologyRefresh.java as per your redis cluster env. Our redis cluster nodes running in kubernetes and we just point to redis cluster kubernetes service as redis host.
  3. Update ssl options as per you setup.
  4. build the project: mvn clean install
  5. run the jar with following command: java -Xms1024m -Xmx1024m -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:GCLockerRetryAllocationCount=100 -XX:+FlightRecorder -XX:StartFlightRecording=duration=300s,filename=lettuce5min.jfr -jar cache-1.0-SNAPSHOT-jar-with-dependencies.jar 5 | tee test5min.log
  6. compare topology refresh time in lettuce5min.jfr with time in test5min.log. Check response time during that period.

ashok-shukla avatar Jul 10 '24 22:07 ashok-shukla

Awesome! Much appreciated!

tishun avatar Jul 11 '24 08:07 tishun