gossip icon indicating copy to clipboard operation
gossip copied to clipboard

Dead Node

Open challengeteamttdh opened this issue 9 years ago • 18 comments

Hi,

I using this lib to intergrate in my application. I saw a issue. when I shutdown a node and turn on again, other node don't know this node. Example: I have 2 node 1,2. First I open all node. Second I turn off node 2 then turn on again but Node 2 don't know node 1 is UP and node 1 also don't know node 2 is UP. Please help me resolve this issue. Thank for your support.

challengeteamttdh avatar May 02 '16 11:05 challengeteamttdh

Questions

  1. are you sure your system clock is in sync
  2. how long was the node down for
  3. Can you set the logging on both servers to debug. and record and relevant output?
  4. what is your configuration? what are the inital contact points

We have a unit tests which does this with 5 nodes so it would be interesting to understand if the same logic does with two nodes

edwardcapriolo avatar May 02 '16 13:05 edwardcapriolo

Currently, I applying gossip for Spring boot application. Each node is a instance of Spring Boot. Let's me some advice for apply gossip to Spring Boot Application.

This is my configuration for Application 1 with port 8081. [{ "cluster":"", "id":"", "port":8081, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "","id": "", "host":"192.168.1.90", "port":8084}, {"cluster": "","id": "", "host":"192.168.1.90", "port":8083}, {"cluster": "","id": "", "host":"192.168.1.90", "port":8082} ] }]

This is my configuration for Application 2 with port 8082. [{ "cluster":"", "id":"", "port":8082, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "","id": "", "host":"192.168.1.90", "port":8084}, {"cluster": "","id": "", "host":"192.168.1.90", "port":8083}, {"cluster": "","id": "", "host":"192.168.1.90", "port":8081} ] }] Let's me know if I'm wrong.

challengeteamttdh avatar May 02 '16 13:05 challengeteamttdh

Each node needs an id. In your case you can generate a string or a uuid that will persist between restarts

edwardcapriolo avatar May 02 '16 14:05 edwardcapriolo

My application using <groupId>io.teknek</groupId> <artifactId>gossip</artifactId> 0.0.3

Maybe it isn't generate id when use method public GossipService(StartupSettings startupSettings) throws InterruptedException, UnknownHostException { this(InetAddress.getLocalHost().getHostAddress(), startupSettings.getPort(), "", startupSettings.getGossipMembers(), startupSettings .getGossipSettings(), null); } is it right? 0.0.3 version is different to latest code on github. do you have any update version on maven?

challengeteamttdh avatar May 02 '16 14:05 challengeteamttdh

Yes. This looks like a bug of that version. The id was not required in original versions but now it is. Can you please try trunk version. I will release the current trunk later today.

edwardcapriolo avatar May 02 '16 15:05 edwardcapriolo

I reviewed code. I think on StartupSetting class. RemoteGossipMember member = new RemoteGossipMember(memberJSON.getString("cluster"), memberJSON.getString("host"), memberJSON.getInt("port"), ""); also need to generate ID for RemoteGossipMember. Currently, I changed latest code but It's still error. Please help me resolve this issuse. I hope that you have a release on today. Thank for your support in this issue.

challengeteamttdh avatar May 02 '16 15:05 challengeteamttdh

RemoteGossipMember member = new RemoteGossipMember(memberJSON.getString("cluster"), memberJSON.getString("host"), memberJSON.getInt("port"), "");

This code is ok. We would not know the remote id until will connect to that host.

Can you give a strip down example of your Spring boot example?

edwardcapriolo avatar May 02 '16 17:05 edwardcapriolo

When I use latest code. It's occur exception. I don't know why.

Exception in thread "pool-6-thread-1" java.lang.NullPointerException at com.google.code.gossip.mana ger.PassiveGossipThread.run(PassiveGossipThread.java:102) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722)

This is my configuration for gossip [{ "cluster":"1", "id":"1", "port":8081, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "1","id": "4", "host":"192.168.1.90", "port":8084, "heartbeat":0}, {"cluster": "1","id": "3", "host":"192.168.1.90", "port":8083, "heartbeat":0}, {"cluster": "1","id": "2", "host":"192.168.1.90", "port":8082, "heartbeat":0} ] }]

challengeteamttdh avatar May 03 '16 07:05 challengeteamttdh

This is my sample code. Please help me review code. https://github.com/challengeteamttdh/springbootgossip Thanks. My application have a shedule run each 20s. It's print number base on number of node alive and position of node alive. when It's have a node DOWN or UP. others node need to know and update rule print number for system. Thank for your support very much.

challengeteamttdh avatar May 03 '16 08:05 challengeteamttdh

    if (memberJSONObject.length() == 5
                  && cluster.equals(memberJSONObject.get(GossipMember.JSON_CLUSTER))) {

This is a new piece of code. I will look at this.

edwardcapriolo avatar May 03 '16 14:05 edwardcapriolo

I found the bug you mentioned. The startup setting code was not setting the cluster name. I am looking at the unit test there because it is suspect. Sorry for the problems. Really cool app I want to take a deeper look at it. Please try the latest trunk again. SOrry for the issues, the cluster name is a new bit and I do not use the StartupSettings code path!

edwardcapriolo avatar May 03 '16 15:05 edwardcapriolo

I updated code it isn't occur Exeption. But when I start 2 instance of Spring Boot with port 8081 and 8082 corresponding to gossip.conf are:

  • 8081 [{ "cluster":"1", "id":"1", "port":8081, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "1","id": "4", "host":"192.168.1.90", "port":8084, "heartbeat":0}, {"cluster": "1","id": "3", "host":"192.168.1.90", "port":8083, "heartbeat":0}, {"cluster": "1","id": "2", "host":"192.168.1.90", "port":8082, "heartbeat":0} ] }]
  • 8082 [{ "cluster":"2", "id":"2", "port":8082, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "2","id": "4", "host":"192.168.1.90", "port":8084, "heartbeat":0}, {"cluster": "2","id": "3", "host":"192.168.1.90", "port":8083, "heartbeat":0}, {"cluster": "2","id": "1", "host":"192.168.1.90", "port":8081, "heartbeat":0} ] }]

We still don't know member node is UP. Firstly, I change port 8081 in application.properties and use gossip.conf for 8081 and start spring boot. Secondly, I change port 8082 in application.properties and use gossip.conf for 8081 and start spring boot. However, We do not know each other UP or DOWN. Let's take look at this. I really love your gossip code to integrate to my application. Please spend time help me resolve this issue.

challengeteamttdh avatar May 03 '16 15:05 challengeteamttdh

Great. Keep in mind the getMemberList does not include yourself, so in a two node cluster each node has me + getMemberList() = 1

edwardcapriolo avatar May 03 '16 16:05 edwardcapriolo

What Do You Mean ? Am I implementing incorrect? So What I need to do to fix this?

challengeteamttdh avatar May 03 '16 16:05 challengeteamttdh

The only thing I am saying is. The member does not include the local member. The local member is assumed.

edwardcapriolo avatar May 03 '16 16:05 edwardcapriolo

Do you have any ideal for my application?. I don't know how to apply gossip to my application. How to a instance of spring boot know to other instance of spring boot.

challengeteamttdh avatar May 04 '16 15:05 challengeteamttdh

How do you start two compies of the application?

mvn spring-boot:run -Drun.jvmArguments='-Dserver.port=8081' mvn spring-boot:run -Drun.jvmArguments='-Dserver.port=8082'

Whehn i do this they take the same config

edwardcapriolo avatar May 05 '16 00:05 edwardcapriolo

You need change port gossip.conf like port instance of Spring Boot. This is gossip.conf for port 8081: [{ "cluster":"1", "id":"1", "port":8081, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "1","id": "4", "host":"192.168.1.90", "port":8084, "heartbeat":0}, {"cluster": "1","id": "3", "host":"192.168.1.90", "port":8083, "heartbeat":0}, {"cluster": "1","id": "2", "host":"192.168.1.90", "port":8082, "heartbeat":0} ] }] Then run mvn spring-boot:run -Drun.jvmArguments='-Dserver.port=8081'

This is gossip.conf for port 8082: [{ "cluster":"2", "id":"2", "port":8082, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "2","id": "4", "host":"192.168.1.90", "port":8084, "heartbeat":0}, {"cluster": "2","id": "3", "host":"192.168.1.90", "port":8083, "heartbeat":0}, {"cluster": "2","id": "1", "host":"192.168.1.90", "port":8081, "heartbeat":0} ] }] Then run mvn spring-boot:run -Drun.jvmArguments='-Dserver.port=8082'

is it necessary run same gossip.conf?

challengeteamttdh avatar May 05 '16 02:05 challengeteamttdh