weave icon indicating copy to clipboard operation
weave copied to clipboard

IP allocation was seeded by different peers in AWS Autoscaling

Open atze234 opened this issue 3 years ago • 0 comments

Hi,

we're getting IP Allocation Errors in our Cluster. we are shutting down all the nodes in our AWS ASG every day to save some money. After a while we see those log messages and weave stops working:

IP allocation was seeded by different peers (received: [1e:96:3e:2b:11:b5(ip-10-20-45-98.eu-central-1.compute.internal)], ours: [de:4a:ad:fe:82:e2])

The 10.20.45.98 node existed some days earlier in that same cluster and the IP was just reassigned by AWS.

It seems that the peer table isnt cleaned up correctly, for example:

/home/weave # ./weave --local status ipam
b2:c6:eb:6f:f4:7a(ip-10-20-44-99.eu-central-1.compute.internal)   393216 IPs (18.8% of total) (2 active)
22:67:0f:db:d8:a2(ip-10-20-44-238.eu-central-1.compute.internal)   262144 IPs (12.5% of total) - unreachable!
02:b9:65:7b:c2:93(ip-10-20-45-103.eu-central-1.compute.internal)   393216 IPs (18.8% of total) 
aa:86:23:5f:c4:87(ip-10-20-44-151.eu-central-1.compute.internal)   131072 IPs (06.2% of total) 
9e:50:fe:0f:69:3a(ip-10-20-45-7.eu-central-1.compute.internal)   393216 IPs (18.8% of total) 
32:ff:37:ae:dd:d7(ip-10-20-45-204.eu-central-1.compute.internal)   262144 IPs (12.5% of total) - unreachable!
62:59:36:75:93:d5(ip-10-20-45-100.eu-central-1.compute.internal)   131072 IPs (06.2% of total) - unreachable!
0a:57:36:10:c6:b5(ip-10-20-44-70.eu-central-1.compute.internal)   131072 IPs (06.2% of total) - unreachable!

Those unreachable nodes were Nodes that have been terminated in ASG by AWS. When a new node comes back with one of those IPs we'll get the IP Allocation Error.

There should be a way to maybe ignore those IP allocation Errors for this case?

$ weave version
2.8.1
$ kubectl version
v1.20.6

atze234 avatar May 20 '21 19:05 atze234