bazel-buildfarm
bazel-buildfarm copied to clipboard
JedisDataException when using Buildfarm with GCP Memorystore
It seems that at the moment buildfarm only supports Redis clusters, which makes it incompatible with Redis setups that are not running in clusters like GCP Memorystore (Redis read-replicas with failover mechanism).
The root seems to be in the RedisClient
I've also wondered about this. On one hand, our integration test uses the official redis image from dockerhub and it builds with bazel end-to-end. That seems like a non-cluster setup.
For example:
docker run -d --name buildfarm-redis --network host redis:5.0.9 --bind localhost
95b3dd07d73a5148879e20c525d2e3f46868494baccff9df1ba484e2fbde49d8
redis-cli
127.0.0.1:6379> cluster info
ERR This instance has cluster support disabled
On the other hand, our jedis code seems cluster specific so I'm wondering why this integration test works in the first place and how it would be different from say GCP Memorystore.
Oh great catch. The execption that I got with Memorystore, was that CLUSTER is an unknown command:
[INFO ] build.buildfarm.server.BuildFarmServer <init> - buildfarm-server-004cd401-1264-4806-9155-65ccab5006aa initialized
Exception in thread "main" redis.clients.jedis.exceptions.JedisDataException: ERR unknown command `CLUSTER`, with args beginning with: `slots`,
at redis.clients.jedis.Protocol.processError(Protocol.java:132)
at redis.clients.jedis.Protocol.process(Protocol.java:166)
at redis.clients.jedis.Protocol.read(Protocol.java:220)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:389)
at redis.clients.jedis.Connection.getUnflushedObjectMultiBulkReply(Connection.java:351)
at redis.clients.jedis.Connection.getObjectMultiBulkReply(Connection.java:356)
at redis.clients.jedis.Jedis.clusterSlots(Jedis.java:3471)
at redis.clients.jedis.JedisClusterInfoCache.failsafeClusterSlots(JedisClusterInfoCache.java:99)
at redis.clients.jedis.JedisClusterInfoCache.discoverClusterNodesAndSlots(JedisClusterInfoCache.java:113)
at redis.clients.jedis.JedisClusterConnectionHandler.initializeSlotsCache(JedisClusterConnectionHandler.java:72)
at redis.clients.jedis.JedisClusterConnectionHandler.<init>(JedisClusterConnectionHandler.java:34)
at redis.clients.jedis.JedisClusterConnectionHandler.<init>(JedisClusterConnectionHandler.java:25)
at redis.clients.jedis.JedisClusterConnectionHandler.<init>(JedisClusterConnectionHandler.java:20)
at redis.clients.jedis.JedisSlotBasedConnectionHandler.<init>(JedisSlotBasedConnectionHandler.java:27)
at redis.clients.jedis.BinaryJedisCluster.<init>(BinaryJedisCluster.java:60)
at redis.clients.jedis.JedisCluster.<init>(JedisCluster.java:114)
at redis.clients.jedis.JedisCluster.<init>(JedisCluster.java:60)
at build.buildfarm.instance.shard.JedisClusterFactory.lambda$createJedisClusterFactory$0(JedisClusterFactory.java:151)
at build.buildfarm.instance.shard.RedisShardBackplane.start(RedisShardBackplane.java:541)
at build.buildfarm.instance.shard.ShardInstance.start(ShardInstance.java:509)
at build.buildfarm.server.BuildFarmServer.start(BuildFarmServer.java:167)
at build.buildfarm.server.BuildFarmServer.serverMain(BuildFarmServer.java:255)
at build.buildfarm.server.BuildFarmServer.main(BuildFarmServer.java:270)
*** shutting down gRPC server since JVM is shutting down
*** server shut down
I did a quick check with telnet as well:
telnet 10.93.153.117 6379
Trying 10.93.153.117...
Connected to 10.93.153.117.
Escape character is '^]'.
cluster slots
-ERR unknown command `cluster`, with args beginning with: `slots`,
The docker image behaves in fact differently:
telnet localhost 6379
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
cluster slots
-ERR This instance has cluster support disabled
Although when looking into the code for Jedis 3.2.0 I can't understand why they would behave differently since both should result in a JedisDataException in Jedis Protocol class
So I figured out why it works in the integration test. Buildfarm is not using the official Jedis client, but some fork. And in this fork, the specific version that Buildfarm is using, has a patch, that checks if the error message from the Redis node is ERR This instance has cluster support disabled
and then instead of running in a cluster mode uses a single node mode.
werkt fork of jedis official jedis
BTW it's also documented in the deps.bzl. It seems there is quite some drift in this specific fork compared to mainline jedis.
I added cluster emulation behavior specifically for this support for non-clustered redis. Have you tried it? Is there an error message?
On Thu, Mar 24, 2022, 6:26 PM Trevor Hickey @.***> wrote:
I've also wondered about this. On one hand, our integration test https://github.com/bazelbuild/bazel-buildfarm/blob/5540cefa7d3211137f1dac76a8258137f310de9e/.bazelci/integration_test.sh#L8 uses the official redis image from dockerhub and it builds with bazel end-to-end. That seems like a non-cluster setup.
For example:
docker run -d --name buildfarm-redis --network host redis:5.0.9 --bind localhost 95b3dd07d73a5148879e20c525d2e3f46868494baccff9df1ba484e2fbde49d8
redis-cli127.0.0.1:6379> cluster info ERR This instance has cluster support disabled
On the other hand, our jedis code seems cluster specific so I'm wondering why this works in the first place and how it would be different from say GCP Memorystore.
— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel-buildfarm/issues/1052#issuecomment-1078440104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAFHFSKTZE3L425RRV5T2DVBTT23ANCNFSM5RRXRJ6Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I looked into the code. The code is checking on This instance has cluster support disabled
while Memorystore replies with unknown command cluster
. Hence the logic to cope with non-clustered redis is not working with Memorystore. Are you referring to a configuration property that I missed?