ng-monitoring icon indicating copy to clipboard operation
ng-monitoring copied to clipboard

Closing ng-monitoring does not actively remove the topology item

Open breezewish opened this issue 3 years ago • 4 comments

I'm using Ctrl+C to gracefully quit the ng-monitoring and starting a new one:

[GIN] 2022/02/08 - 14:43:48 | 404 |         792ns | 192.168.126.218 | GET      "/"
^C[2022/02/08 14:46:06.297 +08:00] [INFO] [main.go:108] ["received signal"] [sig=interrupt]
[2022/02/08 14:46:06.297 +08:00] [INFO] [http.go:79] ["shutting down http server"]
[2022/02/08 14:46:06.297 +08:00] [INFO] [http.go:81] ["http server is down"]
[2022/02/08 14:46:06.298 +08:00] [INFO] [default_subscriber.go:48] ["stopping scrapers"]
[2022/02/08 14:46:06.298 +08:00] [INFO] [default_subscriber.go:51] ["stop scrapers successfully"]
[2022/02/08 14:46:06.298 +08:00] [INFO] [database.go:20] ["Stopping timeseries database"]
[2022/02/08 14:46:06.313 +08:00] [INFO] [database.go:22] ["Stop timeseries database successfully"]
[2022/02/08 14:46:06.313 +08:00] [INFO] [database.go:24] ["Stopping document database"]
[2022/02/08 14:46:06.313 +08:00] [INFO] [document.go:51] ["badger stop running value log gc loop"]
[2022/02/08 14:46:06.384 +08:00] [INFO] [database.go:26] ["Stop document database successfully"]

However the topology is not cleaned up in time, so that TiDB Dashboard keeps connect to the non-existing ng-monitoring server:

$ etcdctl get /topology --prefix
/topology/ng-monitoring/192.168.126.218:12020/info   <-- new
{"git_hash":"1afcaa990af5c65b222e0ab59171867248645f4a","ip":"192.168.126.218","listening_port":12020,"start_timestamp":1644302767}
/topology/ng-monitoring/192.168.126.218:12020/ttl   <-- new
1644302767649395000
/topology/ng-monitoring/192.168.3.105:12020/info   <-- old, gracefully exited
{"git_hash":"1afcaa990af5c65b222e0ab59171867248645f4a","ip":"192.168.3.105","listening_port":12020,"start_timestamp":1644225105}
/topology/ng-monitoring/192.168.3.105:12020/ttl   <-- old, gracefully exited
1644302745923627000
/topology/tidb/127.0.0.1:4000/info
{"version":"v5.3.0","git_hash":"4a1b2e9fe5b5afb1068c56de47adb07098d768d6","ip":"127.0.0.1","status_port":10080,"deploy_path":"/Users/breezewish/.tiup/components/tidb/v5.3.0","start_timestamp":1644224846,"labels":{}}
/topology/tidb/127.0.0.1:4000/ttl
1644302786725919000

This will cause problems when user scales-in and scales-out (switch) the ngm node.

breezewish avatar Feb 08 '22 06:02 breezewish

Etcd key/topology/ng-monitoring/192.168.3.105:12020/ttl will be deleted after ng-monitor download a while.

But key ``/topology/ng-monitoring/192.168.3.105:12020/info` currently won't be deleted.

This behavior is consistent with TiDB.

crazycs520 avatar Feb 09 '22 10:02 crazycs520

Etcd key/topology/ng-monitoring/192.168.3.105:12020/ttl will be deleted after ng-monitor download a while.

But key ``/topology/ng-monitoring/192.168.3.105:12020/info` currently won't be deleted.

This behavior is consistent with TiDB.

Then seems that TiDB also has this problem that need to be fixed. Fortunately, unlike ngm, it will not cause business logic problems.

breezewish avatar Feb 09 '22 11:02 breezewish

Here is the issue I have created for TiDB: https://github.com/pingcap/tidb/issues/32210

breezewish avatar Feb 09 '22 11:02 breezewish

related tiup/tidb-operator issue:

  • https://github.com/pingcap/tiup/issues/1752
  • https://github.com/pingcap/tidb-operator/issues/4402

crazycs520 avatar Feb 16 '22 10:02 crazycs520