ng-monitoring Closing ng-monitoring does not actively remove the topology item

I'm using Ctrl+C to gracefully quit the ng-monitoring and starting a new one:

[GIN] 2022/02/08 - 14:43:48 | 404 |         792ns | 192.168.126.218 | GET      "/"
^C[2022/02/08 14:46:06.297 +08:00] [INFO] [main.go:108] ["received signal"] [sig=interrupt]
[2022/02/08 14:46:06.297 +08:00] [INFO] [http.go:79] ["shutting down http server"]
[2022/02/08 14:46:06.297 +08:00] [INFO] [http.go:81] ["http server is down"]
[2022/02/08 14:46:06.298 +08:00] [INFO] [default_subscriber.go:48] ["stopping scrapers"]
[2022/02/08 14:46:06.298 +08:00] [INFO] [default_subscriber.go:51] ["stop scrapers successfully"]
[2022/02/08 14:46:06.298 +08:00] [INFO] [database.go:20] ["Stopping timeseries database"]
[2022/02/08 14:46:06.313 +08:00] [INFO] [database.go:22] ["Stop timeseries database successfully"]
[2022/02/08 14:46:06.313 +08:00] [INFO] [database.go:24] ["Stopping document database"]
[2022/02/08 14:46:06.313 +08:00] [INFO] [document.go:51] ["badger stop running value log gc loop"]
[2022/02/08 14:46:06.384 +08:00] [INFO] [database.go:26] ["Stop document database successfully"]

However the topology is not cleaned up in time, so that TiDB Dashboard keeps connect to the non-existing ng-monitoring server:

$ etcdctl get /topology --prefix
/topology/ng-monitoring/192.168.126.218:12020/info   <-- new
{"git_hash":"1afcaa990af5c65b222e0ab59171867248645f4a","ip":"192.168.126.218","listening_port":12020,"start_timestamp":1644302767}
/topology/ng-monitoring/192.168.126.218:12020/ttl   <-- new
1644302767649395000
/topology/ng-monitoring/192.168.3.105:12020/info   <-- old, gracefully exited
{"git_hash":"1afcaa990af5c65b222e0ab59171867248645f4a","ip":"192.168.3.105","listening_port":12020,"start_timestamp":1644225105}
/topology/ng-monitoring/192.168.3.105:12020/ttl   <-- old, gracefully exited
1644302745923627000
/topology/tidb/127.0.0.1:4000/info
{"version":"v5.3.0","git_hash":"4a1b2e9fe5b5afb1068c56de47adb07098d768d6","ip":"127.0.0.1","status_port":10080,"deploy_path":"/Users/breezewish/.tiup/components/tidb/v5.3.0","start_timestamp":1644224846,"labels":{}}
/topology/tidb/127.0.0.1:4000/ttl
1644302786725919000

This will cause problems when user scales-in and scales-out (switch) the ngm node.

Feb 08 '22 06:02 breezewish

Etcd key/topology/ng-monitoring/192.168.3.105:12020/ttl will be deleted after ng-monitor download a while.

But key ``/topology/ng-monitoring/192.168.3.105:12020/info` currently won't be deleted.

This behavior is consistent with TiDB.

Feb 09 '22 10:02 crazycs520

Etcd key/topology/ng-monitoring/192.168.3.105:12020/ttl will be deleted after ng-monitor download a while.

But key ``/topology/ng-monitoring/192.168.3.105:12020/info` currently won't be deleted.

This behavior is consistent with TiDB.

Then seems that TiDB also has this problem that need to be fixed. Fortunately, unlike ngm, it will not cause business logic problems.

Feb 09 '22 11:02 breezewish

Here is the issue I have created for TiDB: https://github.com/pingcap/tidb/issues/32210

Feb 09 '22 11:02 breezewish

related tiup/tidb-operator issue:

https://github.com/pingcap/tiup/issues/1752
https://github.com/pingcap/tidb-operator/issues/4402

Feb 16 '22 10:02 crazycs520

ng-monitoring ng-monitoring copied to clipboard

Closing ng-monitoring does not actively remove the topology item

ng-monitoring
ng-monitoring copied to clipboard