metallb
metallb copied to clipboard
FRR mode not restarting BGP when a node becomes unready
- The bug itself, as detailed as you can.
This issue was first discussed on slack thread. I have a kubernetes cluster with 2 worker nodes and 1 master node. Metallb is installed from FRR yaml but with speaker image tag changed from "main" to "v0.12.1" since with "main" tag containers where unable to start and where reportying "crashLoopBack error". The cluster is working with GRE tunneling and a router running with FRR Routing daemonset.
Strict ARP is enabled from init script:
# see what changes would be made, returns nonzero returncode if different
kubectl get configmap kube-proxy -n kube-system -o yaml | \
sed -e "s/strictARP: false/strictARP: true/" | \
kubectl diff -f - -n kube-system
# actually apply the changes, returns nonzero returncode on errors only
kubectl get configmap kube-proxy -n kube-system -o yaml | \
sed -e "s/strictARP: false/strictARP: true/" | \
kubectl apply -f - -n kube-system
The configmap for metallb I'm using is the next:
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
peers:
- peer-address: 10.0.0.1
peer-asn: 64600
my-asn: 64600
bfd-profile: full1
bfd-profiles:
- name: full1
receive-interval: 179
transmit-interval: 180
echo-interval: 62
echo-mode: false
passive-mode: false
minimum-ttl: 254
address-pools:
- name: default
protocol: bgp
addresses:
- 128.28.28.1-128.28.28.20
And the configuration for the FRR Router this:
ip nht resolve-via-default
bfd
profile full1
receive-interval 179
transmit-interval 150
echo receive-interval 62
echo transmit-interval 62
minimum-ttl 254
peer 10.0.0.2
profile full1
no shutdown
peer 10.0.0.3
profile full1
no shutdown
peer 10.0.0.4
profile full1
no shutdown
router bgp 64600
bgp router-id 10.0.0.1
neighbor 10.0.0.2 remote-as 64600
neighbor 10.0.0.2 bfd
neighbor 10.0.0.3 remote-as 64600
neighbor 10.0.0.3 bfd
neighbor 10.0.0.4 remote-as 64600
neighbor 10.0.0.4 bfd
On vtysh "show bfd peers" shows the next output when all nodes are up:
BFD Peers:
peer 10.0.0.2 vrf default
ID: 992899208
Remote ID: 227744130
Active mode
Status: up
Uptime: 10 minute(s), 8 second(s)
Diagnostics: ok
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms
peer 10.0.0.3 vrf default
ID: 1356740786
Remote ID: 4063445766
Active mode
Status: up
Uptime: 10 minute(s), 8 second(s)
Diagnostics: ok
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
BFD Peers:
peer 10.0.0.2 vrf default
ID: 992899208
Remote ID: 227744130
Active mode
Status: up
Uptime: 12 minute(s), 9 second(s)
Diagnostics: ok
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms
peer 10.0.0.3 vrf default
ID: 1356740786
Remote ID: 0
Active mode
Status: down
Downtime: 45 second(s)
Diagnostics: control detection time expired
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms
peer 10.0.0.4 vrf default
ID: 1921253385
Remote ID: 185864328
Active mode
Status: up
Uptime: 11 minute(s), 2 second(s)
Diagnostics: ok
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms Transmission interval: 180ms
Echo receive interval: 62ms
peer 10.0.0.4 vrf default
ID: 1921253385
Remote ID: 185864328
Active mode
Status: up
Uptime: 9 minute(s), 1 second(s)
Diagnostics: ok
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms
On other different executions after stopping and and restarting two worker nodes I geet the next output after executing "show bfd peers":
BFD Peers:
peer 10.0.0.4 vrf default
ID: 2422357222
Remote ID: 0
Active mode
Status: down
Downtime: 11 minute(s), 35 second(s)
Diagnostics: neighbor signaled session down
Remote diagnostics: neighbor signaled session down
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms
peer 10.0.0.3 vrf default
ID: 144310512
Remote ID: 0
Active mode
Status: down
Downtime: 24 minute(s), 31 second(s)
Diagnostics: control detection time expired
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms
peer 10.0.0.2 vrf default
ID: 2275962565
Remote ID: 545881773
Active mode
Status: up
Uptime: 38 minute(s), 35 second(s)
Diagnostics: ok
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms
When one worker node is down it shows:
BFD Peers:
peer 10.0.0.2 vrf default
ID: 992899208
Remote ID: 227744130
Active mode
Status: up
Uptime: 12 minute(s), 9 second(s)
Diagnostics: ok
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms
peer 10.0.0.3 vrf default
ID: 1356740786
Remote ID: 0
Active mode
Status: down
Downtime: 45 second(s)
Diagnostics: control detection time expired
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms
peer 10.0.0.4 vrf default
ID: 1921253385
Remote ID: 185864328
Active mode
Status: up
Uptime: 11 minute(s), 2 second(s)
Diagnostics: ok
Remote diagnostics: ok
Peer Type: configured
Local timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 150ms
Echo receive interval: 62ms
Echo transmission interval: disabled
Remote timers:
Detect-multiplier: 3
Receive interval: 179ms
Transmission interval: 180ms
Echo receive interval: 62ms
-
Version of MetalLB v0.12.1
-
Version of Kubernetes 1.23.5
-
Name and version of network addon (e.g. Calico, Weave...) Flannel from master.
-
Whether you've configured kube-proxy for iptables or ipvs mode Im not sure about this, I have enabled ipvs mode with an starting script.
-
FRR container logs:
sh: 60: unknown operand
Started watchfrr
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
sh: 60: unknown operand
- Speaker container logs on master node (init-worker1 stop-worker1 restart):
{"branch":"HEAD","caller":"level.go:63","commit":"v0.12.1","goversion":"gc / go1.16.14 / amd64","level":"info","msg":"MetalLB speaker starting version 0.12.1 (commit v0.12.1, branch HEAD)","ts":"2022-03-29T14:31:45.735502377Z","version":"0.12.1"}
{"caller":"level.go:63","event":"createARPResponder","interface":"eth0","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T14:31:45.739125696Z"}
{"caller":"level.go:63","event":"createNDPResponder","interface":"eth0","level":"info","msg":"created NDP responder for interface","ts":"2022-03-29T14:31:45.739374708Z"}
{"caller":"level.go:63","event":"createARPResponder","interface":"eth1","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T14:31:45.739511326Z"}
{"caller":"level.go:63","event":"createNDPResponder","interface":"eth1","level":"info","msg":"created NDP responder for interface","ts":"2022-03-29T14:31:45.739572763Z"}
{"caller":"level.go:63","event":"createARPResponder","interface":"docker0","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T14:31:45.739665697Z"}
{"caller":"level.go:63","event":"createARPResponder","interface":"flannel.1","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T14:31:45.739880698Z"}
{"caller":"level.go:63","event":"createNDPResponder","interface":"flannel.1","level":"info","msg":"created NDP responder for interface","ts":"2022-03-29T14:31:45.739926128Z"}
{"caller":"level.go:63","event":"createARPResponder","interface":"cni0","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T14:31:45.740009082Z"}
{"caller":"level.go:63","event":"createNDPResponder","interface":"cni0","level":"info","msg":"created NDP responder for interface","ts":"2022-03-29T14:31:45.740067481Z"}
W0329 14:31:45.750872 472 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
{"caller":"level.go:63","level":"info","msg":"using endpoint slices","op":"New","ts":"2022-03-29T14:31:45.751333984Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.101","node event":"NodeJoin","node name":"master","ts":"2022-03-29T14:31:45.757788511Z"}
W0329 14:31:45.760706 472 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
W0329 14:31:45.762096 472 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
{"caller":"level.go:63","configmap":"metallb-system/config","event":"peerAdded","level":"info","msg":"peer configured, starting BGP session","peer":"10.0.0.1","ts":"2022-03-29T14:31:45.859637573Z"}
{"caller":"level.go:63","configmap":"metallb-system/config","event":"configLoaded","level":"info","msg":"config (re)loaded","ts":"2022-03-29T14:31:45.860039772Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:31:45.860528856Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:31:45.860831129Z"}
{"caller":"level.go:63","event":"nodeLabelsChanged","level":"info","msg":"Node labels changed, resyncing BGP peers","ts":"2022-03-29T14:31:45.861262547Z"}
{"caller":"level.go:63","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-03-29T14:31:45.861529763Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 172.42.42.102: dial tcp 172.42.42.102:7946: connect: connection refused\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T14:31:45.864111906Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:31:45.866951486Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:31:45.874298383Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T14:32:15.736851423Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.102","node event":"NodeJoin","node name":"worker1","ts":"2022-03-29T14:32:28.875622265Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:32:28.888601207Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:32:28.888717863Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T14:32:45.736917729Z"}
{"caller":"level.go:63","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-03-29T14:36:10.253097537Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.103","node event":"NodeJoin","node name":"worker2","ts":"2022-03-29T14:37:03.323619433Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:37:03.329544545Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:37:03.329687498Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T14:37:15.736120195Z"}
W0329 14:40:00.798082 472 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect worker1 has failed, no acks received","ts":"2022-03-29T14:40:26.738056369Z"}
{"caller":"state.go:1157","component":"Memberlist","level":"info","msg":"memberlist: Marking worker1 as failed, suspect timeout reached (0 peer confirmations)","ts":"2022-03-29T14:40:28.194747965Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.102","node event":"NodeLeave","node name":"worker1","ts":"2022-03-29T14:40:28.198093195Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:40:28.210388794Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:40:28.210430025Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect worker1 has failed, no acks received","ts":"2022-03-29T14:40:28.739353217Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T14:40:45.73891404Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 172.42.42.102: dial tcp 172.42.42.102:7946: i/o timeout\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T14:40:55.759543497Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:41:03.453413246Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:41:03.45346262Z"}
{"caller":"level.go:63","event":"serviceWithdrawn","ip":null,"ips":["128.28.28.1"],"level":"info","msg":"withdrawing service announcement","pool":"default","protocol":"bgp","reason":"noEndpoints","service":"default/my-nginx","ts":"2022-03-29T14:41:03.475157054Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T14:41:15.744450669Z"}
{"caller":"level.go:63","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-03-29T14:41:17.314561541Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 172.42.42.102: dial tcp 172.42.42.102:7946: i/o timeout\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T14:41:27.315977877Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 172.42.42.102: dial tcp 172.42.42.102:7946: connect: no route to host\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T14:41:48.831928524Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 172.42.42.102: dial tcp 172.42.42.102:7946: connect: no route to host\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T14:42:48.833649438Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 172.42.42.102: dial tcp 172.42.42.102:7946: connect: connection refused\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T14:43:45.759981517Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.102","node event":"NodeJoin","node name":"worker1","ts":"2022-03-29T14:43:47.96156176Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:43:50.946105448Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:43:50.946318092Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:43:53.01925933Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:43:53.01932921Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T14:44:15.738539496Z"}
{"caller":"level.go:63","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-03-29T14:46:23.953060622Z"}
- Speaker container logs on worker1 node (init-worker1 stop-worker1 restart):
{"branch":"HEAD","caller":"level.go:63","commit":"v0.12.1","goversion":"gc / go1.16.14 / amd64","level":"info","msg":"MetalLB speaker starting version 0.12.1 (commit v0.12.1, branch HEAD)","ts":"2022-03-29T13:20:34.596405447Z","version":"0.12.1"}
{"caller":"level.go:63","event":"createARPResponder","interface":"eth0","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T13:20:34.600675127Z"}
{"caller":"level.go:63","event":"createNDPResponder","interface":"eth0","level":"info","msg":"created NDP responder for interface","ts":"2022-03-29T13:20:34.601438142Z"}
{"caller":"level.go:63","event":"createARPResponder","interface":"eth1","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T13:20:34.601549706Z"}
{"caller":"level.go:63","event":"createNDPResponder","interface":"eth1","level":"info","msg":"created NDP responder for interface","ts":"2022-03-29T13:20:34.601598283Z"}
{"caller":"level.go:63","event":"createARPResponder","interface":"docker0","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T13:20:34.601897331Z"}
{"caller":"level.go:63","event":"createARPResponder","interface":"flannel.1","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T13:20:34.602010286Z"}
{"caller":"level.go:63","event":"createNDPResponder","interface":"flannel.1","level":"info","msg":"created NDP responder for interface","ts":"2022-03-29T13:20:34.602065564Z"}
{"caller":"level.go:63","level":"info","msg":"using endpoint slices","op":"New","ts":"2022-03-29T13:20:34.609040429Z"}
W0329 13:20:34.608553 105 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.103","node event":"NodeJoin","node name":"worker2","ts":"2022-03-29T13:20:34.61722257Z"}
W0329 13:20:34.618984 105 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
W0329 13:20:34.620488 105 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
{"caller":"level.go:63","configmap":"metallb-system/config","event":"peerAdded","level":"info","msg":"peer configured, starting BGP session","peer":"10.0.0.1","ts":"2022-03-29T13:20:34.719002559Z"}
{"caller":"level.go:63","configmap":"metallb-system/config","event":"configLoaded","level":"info","msg":"config (re)loaded","ts":"2022-03-29T13:20:34.719233665Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:20:34.720175561Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:20:34.720529325Z"}
{"caller":"level.go:63","event":"nodeLabelsChanged","level":"info","msg":"Node labels changed, resyncing BGP peers","ts":"2022-03-29T13:20:34.721263293Z"}
{"caller":"level.go:63","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-03-29T13:20:34.72144267Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:20:34.725600945Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:20:34.73142526Z"}
{"caller":"level.go:63","error":null,"expected":2,"joined":1,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T13:20:34.735097399Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"10.0.2.15","node event":"NodeJoin","node name":"master","ts":"2022-03-29T13:20:34.731018203Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.102","node event":"NodeJoin","node name":"worker1","ts":"2022-03-29T13:20:34.735818535Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:20:34.741857531Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:20:34.741916737Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:20:36.605708088Z"}
{"caller":"state.go:1157","component":"Memberlist","level":"info","msg":"memberlist: Marking master as failed, suspect timeout reached (0 peer confirmations)","ts":"2022-03-29T13:20:38.803347968Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"10.0.2.15","node event":"NodeLeave","node name":"master","ts":"2022-03-29T13:20:38.803741044Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:20:38.816895808Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:20:38.817063023Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:20:39.599802998Z"}
{"caller":"level.go:63","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-03-29T13:21:00.725377901Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 10.0.2.15: dial tcp 10.0.2.15:7946: connect: connection refused\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T13:21:00.72614516Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T13:21:04.597578139Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"10.0.2.15","node event":"NodeJoin","node name":"master","ts":"2022-03-29T13:21:27.421856345Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:21:27.428026957Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:21:27.4283128Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:21:30.597815699Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:21:32.598932933Z"}
{"caller":"state.go:1157","component":"Memberlist","level":"info","msg":"memberlist: Marking master as failed, suspect timeout reached (0 peer confirmations)","ts":"2022-03-29T13:21:33.987539957Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"10.0.2.15","node event":"NodeLeave","node name":"master","ts":"2022-03-29T13:21:33.987753164Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:21:33.993390336Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:21:33.993592124Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T13:21:34.612609019Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 10.0.2.15: dial tcp 10.0.2.15:7946: connect: connection refused\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T13:21:34.618049281Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T13:22:04.59718083Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"10.0.2.15","node event":"NodeJoin","node name":"master","ts":"2022-03-29T13:22:27.586091689Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:22:27.599594307Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:22:27.599653775Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:22:30.606196865Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:22:32.602574221Z"}
{"caller":"state.go:1157","component":"Memberlist","level":"info","msg":"memberlist: Marking master as failed, suspect timeout reached (0 peer confirmations)","ts":"2022-03-29T13:22:33.793660746Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"10.0.2.15","node event":"NodeLeave","node name":"master","ts":"2022-03-29T13:22:33.794127557Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:22:33.800362605Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:22:33.800504599Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T13:22:34.596772149Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:22:34.597910104Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 10.0.2.15: dial tcp 10.0.2.15:7946: connect: connection refused\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T13:22:34.61745698Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T13:23:04.617354957Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"10.0.2.15","node event":"NodeJoin","node name":"master","ts":"2022-03-29T13:23:27.448485002Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:23:27.454276192Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:23:27.454422536Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:23:29.601589751Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:23:31.598725432Z"}
{"caller":"state.go:1157","component":"Memberlist","level":"info","msg":"memberlist: Marking master as failed, suspect timeout reached (0 peer confirmations)","ts":"2022-03-29T13:23:33.603494745Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"10.0.2.15","node event":"NodeLeave","node name":"master","ts":"2022-03-29T13:23:33.603643524Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:23:33.60914393Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:23:33.609205337Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:23:34.600079331Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T13:23:34.600339132Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 10.0.2.15: dial tcp 10.0.2.15:7946: connect: connection refused\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T13:23:34.621340181Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T13:24:04.706957407Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"10.0.2.15","node event":"NodeJoin","node name":"master","ts":"2022-03-29T13:24:27.587475148Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:24:27.593425106Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:24:27.593778602Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:24:29.597864274Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:24:31.599141645Z"}
{"caller":"state.go:1157","component":"Memberlist","level":"info","msg":"memberlist: Marking master as failed, suspect timeout reached (0 peer confirmations)","ts":"2022-03-29T13:24:32.832911834Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"10.0.2.15","node event":"NodeLeave","node name":"master","ts":"2022-03-29T13:24:32.833037803Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:24:32.839758576Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T13:24:32.839835981Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect master has failed, no acks received","ts":"2022-03-29T13:24:33.600374831Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T13:24:34.605123235Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 10.0.2.15: dial tcp 10.0.2.15:7946: connect: connection refused\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T13:24:34.618008831Z"}
- Speaker container logs on worker2 node (init-worker1 stop-worker1 restart):
{"branch":"HEAD","caller":"level.go:63","commit":"v0.12.1","goversion":"gc / go1.16.14 / amd64","level":"info","msg":"MetalLB speaker starting version 0.12.1 (commit v0.12.1, branch HEAD)","ts":"2022-03-29T14:37:02.044244041Z","version":"0.12.1"}
{"caller":"level.go:63","event":"createARPResponder","interface":"eth0","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T14:37:02.044903118Z"}
{"caller":"level.go:63","event":"createNDPResponder","interface":"eth0","level":"info","msg":"created NDP responder for interface","ts":"2022-03-29T14:37:02.047000684Z"}
{"caller":"level.go:63","event":"createARPResponder","interface":"eth1","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T14:37:02.047618893Z"}
{"caller":"level.go:63","event":"createNDPResponder","interface":"eth1","level":"info","msg":"created NDP responder for interface","ts":"2022-03-29T14:37:02.047692978Z"}
{"caller":"level.go:63","event":"createARPResponder","interface":"docker0","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T14:37:02.04779035Z"}
{"caller":"level.go:63","event":"createARPResponder","interface":"flannel.1","level":"info","msg":"created ARP responder for interface","ts":"2022-03-29T14:37:02.048047578Z"}
{"caller":"level.go:63","event":"createNDPResponder","interface":"flannel.1","level":"info","msg":"created NDP responder for interface","ts":"2022-03-29T14:37:02.048104355Z"}
{"caller":"level.go:63","level":"info","msg":"using endpoint slices","op":"New","ts":"2022-03-29T14:37:02.066539811Z"}
W0329 14:37:02.066068 102 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.103","node event":"NodeJoin","node name":"worker2","ts":"2022-03-29T14:37:02.073821375Z"}
W0329 14:37:02.075384 102 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
W0329 14:37:02.077491 102 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
{"caller":"level.go:63","configmap":"metallb-system/config","event":"peerAdded","level":"info","msg":"peer configured, starting BGP session","peer":"10.0.0.1","ts":"2022-03-29T14:37:02.175896131Z"}
{"caller":"level.go:63","configmap":"metallb-system/config","event":"configLoaded","level":"info","msg":"config (re)loaded","ts":"2022-03-29T14:37:02.176105939Z"}
{"caller":"level.go:63","event":"nodeLabelsChanged","level":"info","msg":"Node labels changed, resyncing BGP peers","ts":"2022-03-29T14:37:02.176224696Z"}
{"caller":"level.go:63","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-03-29T14:37:02.17626329Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.102","node event":"NodeJoin","node name":"worker1","ts":"2022-03-29T14:37:02.18486104Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.101","node event":"NodeJoin","node name":"master","ts":"2022-03-29T14:37:02.185410703Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:37:02.185542432Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:37:02.186116794Z"}
{"caller":"level.go:63","level":"info","msg":"memberlist join succesfully","number of other nodes":2,"op":"Member detection","ts":"2022-03-29T14:37:02.192170283Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:37:02.195986505Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:37:02.196703177Z"}
{"caller":"level.go:63","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-03-29T14:37:31.234814731Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T14:37:32.045073767Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect worker1 has failed, no acks received","ts":"2022-03-29T14:40:23.047887043Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect worker1 has failed, no acks received","ts":"2022-03-29T14:40:25.051992337Z"}
{"caller":"state.go:1157","component":"Memberlist","level":"info","msg":"memberlist: Marking worker1 as failed, suspect timeout reached (0 peer confirmations)","ts":"2022-03-29T14:40:27.048374201Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.102","node event":"NodeLeave","node name":"worker1","ts":"2022-03-29T14:40:27.048697111Z"}
{"caller":"state.go:459","component":"Memberlist","level":"info","msg":"memberlist: Suspect worker1 has failed, no acks received","ts":"2022-03-29T14:40:27.048879516Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:40:27.054425204Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:40:27.054550778Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T14:40:32.045019585Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:41:02.292242879Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:41:02.292346013Z"}
{"caller":"level.go:63","event":"serviceWithdrawn","ip":null,"ips":["128.28.28.1"],"level":"info","msg":"withdrawing service announcement","pool":"default","protocol":"bgp","reason":"noEndpoints","service":"default/my-nginx","ts":"2022-03-29T14:41:02.331142165Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 172.42.42.102: dial tcp 172.42.42.102:7946: i/o timeout\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T14:41:12.075017958Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T14:41:32.044753429Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 172.42.42.102: dial tcp 172.42.42.102:7946: connect: no route to host\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T14:42:05.140521946Z"}
{"caller":"level.go:63","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-03-29T14:42:38.900022216Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 172.42.42.102: dial tcp 172.42.42.102:7946: connect: no route to host\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T14:42:41.969281029Z"}
{"caller":"level.go:63","error":"1 error occurred:\n\t* Failed to join 172.42.42.102: dial tcp 172.42.42.102:7946: connect: no route to host\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-03-29T14:43:05.136842044Z"}
{"caller":"level.go:63","level":"info","msg":"node event - forcing sync","node addr":"172.42.42.102","node event":"NodeJoin","node name":"worker1","ts":"2022-03-29T14:43:46.819066189Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:43:49.798107151Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:43:49.798331253Z"}
{"caller":"level.go:63","event":"updatedAdvertisements","ips":["128.28.28.1"],"level":"info","msg":"making advertisements using BGP","numAds":1,"pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:43:51.873261069Z"}
{"caller":"level.go:63","event":"serviceAnnounced","ips":["128.28.28.1"],"level":"info","msg":"service has IP, announcing","pool":"default","protocol":"bgp","service":"default/my-nginx","ts":"2022-03-29T14:43:51.873464734Z"}
{"caller":"level.go:63","level":"info","op":"reload-validate","success":"reloaded config","ts":"2022-03-29T14:44:02.045142609Z"}
W0329 14:46:55.088407 102 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
{"caller":"level.go:63","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-03-29T14:47:46.757943443Z"}
-Environment replication: The issue can be replicated with this repository (branch FRR_BFD) on vagrant. To check if the cluster is running simply enter 172.42.42.100 on web browser.
Is there any progress?
Hi @xhejtman , we are giving this priority, however we are struggling in reproducing it. If this is happening often on your cluster, would you mind setting the log level to debug and attaching the logs of the frr container and of the speaker here?
metallb.zip debug logs.
I rebooted node 4. nginx has been restarted on node 1. (loadbalancer ip 10.16.48.119). I forced nginx to be moved back to node 4.
nodes 1, 3, 4 stopped sending bgp updates to peer, only node 2 continued to send updates.
I double checked this with the following:
- node reboot itself is not a problem.
- if node is rebooted and back up, I can see all nodes are sending bgp updates
- problem is with moving loadbalancer service from a node to another after the node is back online
- once I force service move, some nodes stop sending bgp updates.
- this could be seen in logs. ip 10.16.48.119 has been moved from node 4 to node 1 due to node 4 reboot. I moved it back to node 4 and updates stopped.
Thanks! Just one question: what you mean by "force service move" ? Normally the lb service is advertised from all the nodes.
kubectl cordon node 1
kubectl delete pod -n kube-system rke2-ingress-nginx-controller-6b45cb6ddc-cdw26
. this forces start on another node than 1.
oh, and maybe related thing. ingress loadbalancer service uses externaltrafficpolicy: Local
I was about to ask that :-)
yes, that is the way I noticed that it stopped working, as the service is actually not announced on all nodes ;)
would you mind jumping on the metallb slack on the kubernetes workspace so we don't pollute the issue?
Hello! Recently i faced the same issue with FRR mode. We've got configuration with Canal CNI, Service with ExternalTrafficPolicy set to Local. When i delete pod or restart whole deployment routes are disapearing from neighbor and are not redistributed.
Did you work out any solution?
Hi @michalg91 , would you mind clarifying? Are you restarting the deployment OR the speakers?
And, would you mind sharing your configuration and the details of the service?
Hi, @fedepaol I am restarting the deployment under the service (in this case nginx-ingress). And until i restart speakers the paths are not redistributed.
I also tried to change frr version but no mather which one i chose the problem still occurs.
kind: BFDProfile
metadata:
name: bfd-stage
namespace: metallb-system
spec:
detectMultiplier: 10
echoMode: true
minimumTtl: 250
passiveMode: false
receiveInterval: 700
transmitInterval: 700
---
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
name: bgp-bird-am4
namespace: metallb-system
spec:
myASN: 65531
peerASN: 65200
peerAddress: 172.26.6.105
bfdProfile: bfd-stage
routerID: 10.50.64.0
holdTime: "10s"
ebgpMultiHop: true
keepaliveTime: "3s"
nodeSelectors:
- matchLabels:
topology.kubernetes.io/zone: am4
---
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
name: bgp-bird-am6
namespace: metallb-system
spec:
myASN: 65531
peerASN: 65201
peerAddress: 172.26.134.105
bfdProfile: bfd-stage
routerID: 10.50.64.0
holdTime: "10s"
keepaliveTime: "3s"
ebgpMultiHop: true
nodeSelectors:
- matchLabels:
topology.kubernetes.io/zone: am6
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: bgp-one-services
namespace: metallb-system
spec:
addresses:
- 10.50.64.0/28
autoAssign: true
---
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
name: bgp-bird-advertisement
namespace: metallb-system
spec:
ipAddressPools:
- bgp-one-services
FRR Config in speaker:
!
Building configuration...
Current configuration:
!
frr version 8.3.1_git
frr defaults traditional
hostname am6-one-worklaod-dev-15-worker4
log file /etc/frr/frr.log informational
log timestamp precision 3
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 65531
bgp router-id 10.50.64.0
no bgp ebgp-requires-policy
no bgp suppress-duplicates
no bgp hard-administrative-reset
no bgp default ipv4-unicast
no bgp graceful-restart notification
no bgp network import-check
neighbor 172.26.134.105 remote-as 65201
neighbor 172.26.134.105 bfd
neighbor 172.26.134.105 bfd profile bfd-stage
neighbor 172.26.134.105 ebgp-multihop 255
neighbor 172.26.134.105 timers 3 10
!
address-family ipv4 unicast
network 10.50.64.0/32
network 10.50.64.1/32
neighbor 172.26.134.105 activate
neighbor 172.26.134.105 route-map 172.26.134.105-in in
neighbor 172.26.134.105 route-map 172.26.134.105-out out
exit-address-family
exit
!
ip prefix-list 172.26.134.105-pl-ipv4 seq 5 permit 10.50.64.1/32
ip prefix-list 172.26.134.105-pl-ipv4 seq 10 permit 10.50.64.0/32
ip prefix-list 172.26.134.105-pl-ipv4 seq 15 deny any
!
ipv6 prefix-list 172.26.134.105-pl-ipv4 seq 5 deny any
!
route-map 172.26.134.105-in deny 20
exit
!
route-map 172.26.134.105-out permit 1
match ip address prefix-list 172.26.134.105-pl-ipv4
exit
!
route-map 172.26.134.105-out permit 2
match ipv6 address prefix-list 172.26.134.105-pl-ipv4
exit
!
ip nht resolve-via-default
!
ipv6 nht resolve-via-default
!
bfd
profile bfd-stage
detect-multiplier 10
transmit-interval 700
receive-interval 700
echo-mode
minimum-ttl 250
exit
!
exit
!
end
service:
apiVersion: v1
kind: Service
metadata:
annotations:
field.cattle.io/publicEndpoints: '[{"addresses":["10.50.64.0"],"port":80,"protocol":"TCP","serviceName":"nginx-ingress:ingress-internal-ingress-nginx-controller-loadbalancer","allNodes":false},{"addresses":["10.50.64.0"],"port":443,"protocol":"TCP","serviceName":"nginx-ingress:ingress-internal-ingress-nginx-controller-loadbalancer","allNodes":false}]'
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":null,"name":"ingress-internal-ingress-nginx-controller-loadbalancer","namespace":"nginx-ingress"},"spec":{"ports":[{"appProtocol":"http","name":"http","port":80,"protocol":"TCP","targetPort":"http"},{"appProtocol":"https","name":"https","port":443,"protocol":"TCP","targetPort":"https"}],"selector":{"app.kubernetes.io/component":"controller","app.kubernetes.io/instance":"ingress-internal","app.kubernetes.io/name":"ingress-nginx"},"type":"LoadBalancer"}}
creationTimestamp: "2022-10-18T08:07:51Z"
name: ingress-internal-ingress-nginx-controller-loadbalancer
namespace: nginx-ingress
resourceVersion: "18640886"
uid: c4d61fae-7d83-4ba3-9a41-c060e6e138ab
spec:
allocateLoadBalancerNodePorts: true
clusterIP: 10.43.225.32
clusterIPs:
- 10.43.225.32
externalTrafficPolicy: Local
healthCheckNodePort: 31875
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- appProtocol: http
name: http
nodePort: 32245
port: 80
protocol: TCP
targetPort: http
- appProtocol: https
name: https
nodePort: 32436
port: 443
protocol: TCP
targetPort: https
selector:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-internal
app.kubernetes.io/name: ingress-nginx
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: 10.50.64.0
Couple of more questions:
- can you try with the native mode?
- can you try bumping the version of the frr image to latest?
I can confirm that in native mode it is working as expected, but usage of bfd is important to us. Yes, i tried changed frr image to latest.
I was trying to narrow down, I am not suggesting that reverting is a valid workaround :-) So, with latest FRR you still have the issue?
And, does this happen always?
Yes, this happens always. With latest FRR and with 7.5.1
Try to build frr image yourself from git. At least this head 403f312d5657d2d62775dc57c0a73bfb85118ea6 worker for me but I need to use custom frr image.
or if you want, you can use our image: cerit.io/frr/frr:ubi-8-403f312d56
@xhejtman @fedepaol thanks for your help, i think i got it running (only done few tests). I have build frr from same head as @xhejtman, also i have set avoidBuggyIPs in addresspool configuration ~~and i think it did the trick~~.
Edit: I forgot to change ExternalTrafficPolicy to Local, it still doesn't work.
This issue has been automatically marked as stale because it has been open 30 days with no activity. This issue will be closed in 10 days unless you do one of the following:
- respond to this issue
- have one of these labels applied:
bug
,good first issue
,help wanted
,hold
,enhancement
,documentation
,question
I think this is workarounded by https://github.com/metallb/metallb/pull/1732 , where we verify frr is healthy via a probe. Closing this