tfchain
tfchain copied to clipboard
Devnet light public node issues after it being offline for a while
Thursday (01/12) evening until Friday morning the public devnet light node was offline for about 16h due to dc networking issues. It kept running but was unreachable for anyone/anything.
Since it came back online there are lots of open TCP connections to the node.
root@tfchain-dev-pub-light01:~# lsof -n -p 70767 | wc -l
18934
We saw this before on mainnet with the gridproxy issues, but this is different. These connections come from many different ip's. Lee extracted a list of the IP's with their amount of open connections:
74 2600:1700:c1e0:1150::2c
176 2600:1700:c1e0:1150::49
561 2600:1700:c1e0:1150:7c7a:cfff:fe76:d51f
559 2600:1700:c1e0:1150:d448:2dff:fe8e:2eb8
207 2a02:1802:5e:16:27f5:1282:21b5:a356
39 2a02:1802:5e:16:7cea:e670:64e6:acac
211 2a02:1802:5e:16:b618:3cc8:73f0:a901
202 2a02:1802:5e:16:b91:530f:7225:ad91
204 2a10:b600:0:9:23d1:b915:c8db:8cd1
215 2a10:b600:0:9:2db1:6689:c235:53ac
273 2a10:b600:0:9:34fd:b37e:92d8:1b8d
209 2a10:b600:0:9:4981:33c:ef40:d05a
211 2a10:b600:0:9:5f1f:2754:3cd0:d098
208 2a10:b600:0:9:77ca:b3c0:459e:cb5e
209 2a10:b600:0:9:8f18:d624:13c6:d404
208 2a10:b600:0:9:b08e:cf24:2af1:198f
210 2a10:b600:0:9:c6e7:862b:891c:7d15
210 2a10:b600:0:9:d407:accb:cad9:9add
210 2a10:b600:0:9:db91:142f:a8ba:5a1d
208 2a10:b600:0:9:ddc5:f3de:e0c2:a865
207 2a10:b600:0:9:f404:31b8:30a7:8416
208 2a10:b600:0:9:f7b1:6368:f63b:6971
1 2a10:b600:0:be77:5213:ad41:aba2:8d38
1 2a10:b600:0:be77:6550:4bf9:3ffe:fdac
762 2a10:b600:1:0:1459:4bff:fe15:966a
758 2a10:b600:1:0:149b:c2ff:fe41:d0b9
748 2a10:b600:1:0:40f9:b5ff:fe38:6188
760 2a10:b600:1:0:415:f1ff:fe0f:c9b1
738 2a10:b600:1:0:4ce9:edff:fe24:39c1
755 2a10:b600:1:0:60c9:b1ff:fe70:7e32
755 2a10:b600:1:0:8aa:abff:fe74:6ff5
751 2a10:b600:1:0:a832:1ff:fe2d:93c
757 2a10:b600:1:0:b40b:faff:fedb:7a7c
761 2a10:b600:1:0:b868:4ff:fe50:ccf5
757 2a10:b600:1:0:bc70:e3ff:fe9a:18b7
758 2a10:b600:1:0:ccc1:81ff:fef4:c9ff
755 2a10:b600:1:0:d0e4:afff:feb0:7599
760 2a10:b600:1:0:d462:73ff:fef0:e6dd
679 2a10:b600:1:0:d8fd:3fff:fee5:f485
761 2a10:b600:1:0:f823:1ff:fe0c:66f2
766 2a10:b600:1:0:f82b:68ff:fea7:f306
762 2a10:b600:1:0:f833:d4ff:feb4:cf7d
At that time there were ofcourse lots of connectivity error's towards tfchain.dev.grid.tf: https://mon.grid.tf/explore?orgId=1&left=%5B%221669914000000%22,%221669915859000%22,%22Loki%22,%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnetwork%3D%5C%22development%5C%22%7D%22%7D%5D
Once the node was online again the error's stopped, but the amount of connections rose dramatically: https://mon.grid.tf/explore?orgId=1&left=%5B%221669971600000%22,%221669975200000%22,%22Loki%22,%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnetwork%3D%5C%22development%5C%22%7D%22%7D%5D
Since it only keeps the last 1000 blocks it might be the reason for the current situation. ZOS maybe trying to fetch blocks that are already gone from the light node?