Time is takes for a node to rejoin the cluster
Hey there, I am running through some tests with your test application. Four nodes in the cluster. I terminate a single node and bring it back online and seems like the time it takes for the terminated node to rejoin the cluster and node be on the dead list is highly variable. Sometimes it will take 30 seconds, sometimes 5 minutes, sometimes never?
Maybe I am making an improper configuration? If I query the recently terminated node it shows all nodes live, but the other 3 nodes have trouble knowing the previously terminated node is not live.
I have tried this on the main branch. Any advice would be great. Very cool system.
Ok, so I think this has something to do with is_ready_predicate. I noticed if I add a key/value to the node that will not become active in the cluster, it will then be ready in the cluster again.
Hi @densone,
We removed the readiness logic from Chitchat on main, so unless I missed some code when performing that refactor, the is_ready_predicate should no longer exist.
Could you help me reproduce your issue? Are you running four nodes in the same process in unit tests, in 4 shells via the command line, or in your application?
Feel free to ping me on Discord.