lettuce icon indicating copy to clipboard operation
lettuce copied to clipboard

PartitionSelectorException in PooledClusterConnectionProvider when using enablePeriodicRefresh

Open tadashiya opened this issue 1 year ago • 1 comments

Bug Report

Related to #2146, I encountered PartitionSelectorException. When I use enablePeriodicRefresh(), in some case, masterCache = EMPTY and PooledClusterConnectionProvider fails to get a master node. There is a time gap between invalidateCache() and this.masterCache = masterCache in Partition.java. https://github.com/lettuce-io/lettuce-core/blob/a60560e693b3f73acc3bc6947fa78735dc0dd547/src/main/java/io/lettuce/core/cluster/models/partitions/Partitions.java#L163-L167

Current Behavior

In some case, PartitionSelectorException occurs, when I use enablePeriodicRefresh().

Stack trace
Exception in thread "main" io.lettuce.core.cluster.PartitionSelectorException: Cannot determine a partition for slot 4241.
	at io.lettuce.core.cluster.PooledClusterConnectionProvider.getWriteConnection(PooledClusterConnectionProvider.java:164)
	at io.lettuce.core.cluster.PooledClusterConnectionProvider.getConnectionAsync(PooledClusterConnectionProvider.java:149)
	at io.lettuce.core.cluster.ClusterDistributionChannelWriter.doWrite(ClusterDistributionChannelWriter.java:170)
	at io.lettuce.core.cluster.ClusterDistributionChannelWriter.write(ClusterDistributionChannelWriter.java:103)
	at io.lettuce.core.RedisChannelHandler.dispatch(RedisChannelHandler.java:218)
	at io.lettuce.core.cluster.StatefulRedisClusterConnectionImpl.dispatch(StatefulRedisClusterConnectionImpl.java:216)
	at io.lettuce.core.AbstractRedisAsyncCommands.dispatch(AbstractRedisAsyncCommands.java:676)
	at io.lettuce.core.AbstractRedisAsyncCommands.setex(AbstractRedisAsyncCommands.java:1700)
	at jdk.internal.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at io.lettuce.core.cluster.ClusterFutureSyncInvocationHandler.handleInvocation(ClusterFutureSyncInvocationHandler.java:122)
	at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)

Input Code

Input Code
@SpringBootApplication
class DemoApplication

fun main(args: Array<String>) {
    val context = runApplication<DemoApplication>(*args)

    @Suppress("UNCHECKED_CAST")
    val syncCommands =
        context.getBean(RedisClusterCommands::class.java) as RedisClusterCommands<String, String>
    for (i in 1..10000) {
        syncCommands.setex("key$i", 10, "value$i")
        if (i % 100 == 0) {
            println(i)
        }
    }
}

@Configuration
@EnableConfigurationProperties(RedisClientProperties::class)
class RedisClientConfiguration(
    private val props: RedisClientProperties
) {

    @Bean
    fun getCommands(): RedisClusterCommands<String, String> {
        return build().connect().sync()
    }

    private fun build(): RedisClusterClient {
        val redisURIs =
            props.nodes.map {
                val (host, port) = it.split(":")
                RedisURI.builder()
                    .withHost(host)
                    .withPassword(props.password as CharSequence)
                    .withAuthentication(props.userName, props.password as CharSequence)
                    .withPort(port.toInt())
                    .build()
            }
        val redisClusterClient = RedisClusterClient.create(redisURIs)
        val topologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
            .enablePeriodicRefresh(Duration.ofSeconds(2)) // Make it happen easily.
            .build()
        val clientOptions = ClusterClientOptions
            .builder()
            .topologyRefreshOptions(topologyRefreshOptions)
            .build()
        redisClusterClient.setOptions(clientOptions)
        return redisClusterClient
    }
}

Expected behavior/code

In Partitions#updateCache(), masterCache shouldn't be empty.

Environment

  • Lettuce version(s): 6.2.0.RELEASE
  • Redis version: 6.2.7, 4.0.10

Possible Solution

Move invalidateCache() to if clause following.

tadashiya avatar Aug 09 '22 08:08 tadashiya

I had the same issue in my environment.

Please check when you have time. @mp911de 🙏 🙏

be-hase avatar Sep 07 '22 01:09 be-hase

Thanks for the report. It is indeed a subtle change, but you're right, there is a gap between the invalidation of the topology cache and the cache rebuild. It makes sense to close that gap.

mp911de avatar Oct 07 '22 08:10 mp911de

Thank you for your work.

tadashiya avatar Oct 07 '22 09:10 tadashiya

Reclassifying this as regression because in 6.1.x, we had the exact same behavior as the fix you provided.

mp911de avatar Oct 07 '22 13:10 mp911de