TDengine icon indicating copy to clipboard operation
TDengine copied to clipboard

【bug&fixed】vnode status keep "unsynced" when tarbitrator process shuts down even it starts up again

Open t00350320 opened this issue 4 years ago • 1 comments

tdengine  version: 2.2.1.3
key configurations:
replica               2
numOfMnodes               3

dnodes is 3.
arbitrator is used .


We suppose it comes when :
1、tarbitrator process is shut down (some reason)  【we don't know why it's shut down,and we can recreate this issue by this method. I guess there may be other reason for the slave dnode's 'unsynced' status, like a special  occasion in detection and election】
2、one vgroup's master dnode is shut donw/offline,
3、we start up tarbitrator again
4、hours later,slave dnode doesn't transfer status from unsync to master. 

status change like:
1:  taos> show vgroups;
		 vgId 		| 	tables		|  status  |	 onlines	 | v1_dnode | v1_status | v2_dnode | v2_status | compacting  |
	=================================================================================================================
		7 | 				 53 | ready 	 |					 2 |				3 | slave 		| 			 2 | master 	 |					 0 |

2:taos> show vgroups;
		 vgId 		| 	tables		|  status  |	 onlines	 | v1_dnode | v1_status | v2_dnode | v2_status | compacting  |
	=================================================================================================================
	7 | 				 53 | ready 	 |					 1 |				3 | unsynced	| 			 2 | master 	 |					 0 |

3,4:  taos> show vgroups;
		 vgId 		| 	tables		|  status  |	 onlines	 | v1_dnode | v1_status | v2_dnode | v2_status | compacting  |
	=================================================================================================================
           7 | 				 53 | ready 	 |					 0 |				3 | unsynced	| 			 2 | offline	 |					 0 |

	 ***/
	// if some node is unsynced(only slave dnode need to transfer), and no master node(master may be offline), we suppose TAOS_SYNC_ROLE_UNSYNCED dnode should have a chance to  to find the index 

So, we add two flags to judge above abnormal status: unsync_count, if >0, we suppose there is a unsync dnode master_count, if =0, we suppose the master dnode is shut down then, we suppose the unsynced dnode(slave dnode) should have chance to be elected as master.

Test and verify: with modified code, unsynced status transfered to master; with modified code, master dnode keeps 'master' status when slave dnode shutted down (Existing capabilities, unaffected)

t00350320 avatar Dec 17 '21 07:12 t00350320

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jul 29 '22 05:07 CLAassistant