ng-monitoring
                                
                                
                                
                                    ng-monitoring copied to clipboard
                            
                            
                            
                        Closing ng-monitoring does not actively remove the topology item
I'm using Ctrl+C to gracefully quit the ng-monitoring and starting a new one:
[GIN] 2022/02/08 - 14:43:48 | 404 |         792ns | 192.168.126.218 | GET      "/"
^C[2022/02/08 14:46:06.297 +08:00] [INFO] [main.go:108] ["received signal"] [sig=interrupt]
[2022/02/08 14:46:06.297 +08:00] [INFO] [http.go:79] ["shutting down http server"]
[2022/02/08 14:46:06.297 +08:00] [INFO] [http.go:81] ["http server is down"]
[2022/02/08 14:46:06.298 +08:00] [INFO] [default_subscriber.go:48] ["stopping scrapers"]
[2022/02/08 14:46:06.298 +08:00] [INFO] [default_subscriber.go:51] ["stop scrapers successfully"]
[2022/02/08 14:46:06.298 +08:00] [INFO] [database.go:20] ["Stopping timeseries database"]
[2022/02/08 14:46:06.313 +08:00] [INFO] [database.go:22] ["Stop timeseries database successfully"]
[2022/02/08 14:46:06.313 +08:00] [INFO] [database.go:24] ["Stopping document database"]
[2022/02/08 14:46:06.313 +08:00] [INFO] [document.go:51] ["badger stop running value log gc loop"]
[2022/02/08 14:46:06.384 +08:00] [INFO] [database.go:26] ["Stop document database successfully"]
However the topology is not cleaned up in time, so that TiDB Dashboard keeps connect to the non-existing ng-monitoring server:
$ etcdctl get /topology --prefix
/topology/ng-monitoring/192.168.126.218:12020/info   <-- new
{"git_hash":"1afcaa990af5c65b222e0ab59171867248645f4a","ip":"192.168.126.218","listening_port":12020,"start_timestamp":1644302767}
/topology/ng-monitoring/192.168.126.218:12020/ttl   <-- new
1644302767649395000
/topology/ng-monitoring/192.168.3.105:12020/info   <-- old, gracefully exited
{"git_hash":"1afcaa990af5c65b222e0ab59171867248645f4a","ip":"192.168.3.105","listening_port":12020,"start_timestamp":1644225105}
/topology/ng-monitoring/192.168.3.105:12020/ttl   <-- old, gracefully exited
1644302745923627000
/topology/tidb/127.0.0.1:4000/info
{"version":"v5.3.0","git_hash":"4a1b2e9fe5b5afb1068c56de47adb07098d768d6","ip":"127.0.0.1","status_port":10080,"deploy_path":"/Users/breezewish/.tiup/components/tidb/v5.3.0","start_timestamp":1644224846,"labels":{}}
/topology/tidb/127.0.0.1:4000/ttl
1644302786725919000
This will cause problems when user scales-in and scales-out (switch) the ngm node.
Etcd key/topology/ng-monitoring/192.168.3.105:12020/ttl will be deleted after ng-monitor download a while.
But key ``/topology/ng-monitoring/192.168.3.105:12020/info` currently won't be deleted.
This behavior is consistent with TiDB.
Etcd key
/topology/ng-monitoring/192.168.3.105:12020/ttlwill be deleted afterng-monitordownload a while.But key ``/topology/ng-monitoring/192.168.3.105:12020/info` currently won't be deleted.
This behavior is consistent with TiDB.
Then seems that TiDB also has this problem that need to be fixed. Fortunately, unlike ngm, it will not cause business logic problems.
Here is the issue I have created for TiDB: https://github.com/pingcap/tidb/issues/32210
related tiup/tidb-operator issue:
- https://github.com/pingcap/tiup/issues/1752
 - https://github.com/pingcap/tidb-operator/issues/4402