trafodion
trafodion copied to clipboard
TRAFODION-2940 In HA env, one node lose network, when recover, trafci can't use
when there loses network for a long time ,and then network recover, there will trigger zookeeper session expired, at this time ,check whether current dcsmaster is leader, if not unbind this node's floating ip, and make dcsmaster init status, then rerun dcs master.
Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2399/
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/2399/
Hope some DCS experts to take a review. @mashengchen please invite proper DCS experts. I cannot understand these changes well.
@arvind-narain @kevinxu021 can you help to take a look
Can you please describe with more details regarding the original issue in the JIRA. It is unclear to me the scenario with 2 floating IPs. Let's hold off on the merge until we understand the original issue that will help us understand the changes done
- config ha env
- sqstart
- use iptables to down master node's network, (iptables -I INPUT -s hostname -j DROP)
- sleep for 300 seconds
- backup-master take over the master role
- recover network (iptables -I INPUT -s hostname -d DROP)
- pdsh $MY_NODES ifconfig|grep 23400 will have 2 results, one is the down dcsmaster and another is the backup-master actually , the old dcsmaster is still in while loop when network recover, so there will have 2 working dcsmaster.
@hegdean, are you happy with this change now? Should I merge it?
@svarnau yes , I had done the test, and it solved the duplicate IP problem
Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2475/
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/2475/
Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2691/
Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/2691/
@arvind can u please review this
I think the wrong Arvind was tagged :)
@arvind-narain can u please review
Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/2887/
Test Failed. https://jenkins.esgyn.com/job/Check-PR-master/2887/