k8s-bigip-ctlr
k8s-bigip-ctlr copied to clipboard
CIS nodepoller doesn't work and stop do the arps process If there is one node in the cluster which the Vtep MAC cannot be obtained
Setup Details
CIS Version : 2.7.0 Build: f5networks/k8s-bigip-ctlr:latest BIGIP Version: BIG-IP 15.1.4 Build 0.0.47 Final AS3 Version: none Agent Mode: CCCL Orchestration: K8S Orchestration Version: kubernetes v1.21.5 Pool Mode: Cluster Additional Setup details: <Platform/CNI Plugins/ cluster nodes/ etc>
Platform : CentOS Linux release 8.4.2105 Kernel: 4.18.0-305.19.1.el8_4.x86_64 CNI Plugins: flannel
Description
Due to one node in the cluster cannot get vtepmac,CIS nodepoller doesn't work and stop do the arps process.
Steps To Reproduce
- To reproduce the issue simulates the node loss vtepmac , we edit the node yaml file to remove the annotation
flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"5a:de:e9:80:38:7e"}'
which was automatically inserted by flannel.#kubectl edit node cluster1-w1
and remove the annotationflannel.alpha.coreos.com/backend-data: '{"VtepMAC":"5a:de:e9:80:38:7e"}'
and save. - And scale one deployment which watch by CIS , wait long enough to see if VE have refresh configuration
- View the CIS log The normal worker node's CIDR is 10.42.0.0/24. The abnormal worker node's CIDR is 10.42.1.0/24.
2022/01/07 01:25:00 [INFO] [INIT] Starting: Container Ingress Services - Version: 2.7.0, BuildInfo: azure-1697-0dd06d23f0761fd29b1f614a52ed4b3695653cdd
2022/01/07 01:25:01 [INFO] ConfigWriter started: 0xc000369020
2022/01/07 01:25:01 [INFO] Started config driver sub-process at pid: 17
2022/01/07 01:25:01 [INFO] [INIT] Creating Agent for cccl
2022/01/07 01:25:01 [INFO] [CCCL] Initializing CCCL Agent
2022/01/07 01:25:01 [INFO] [CCCL] Removing Partition p1_AS3
2022/01/07 01:25:02 [INFO] [CORE] NodePoller (0xc0002645a0) registering new listener: 0x17a6700
2022/01/07 01:25:02 [INFO] [CORE] NodePoller (0xc0002645a0) registering new listener: 0x1757a40
2022/01/07 01:25:02 [INFO] [CORE] NodePoller started: (0xc0002645a0)
2022/01/07 01:25:02 [INFO] [CORE] Not watching Ingress resources.
2022/01/07 01:25:02 [INFO] [CORE] Watching ConfigMap resources.
2022/01/07 01:25:02 [INFO] [CORE] Handling ConfigMap resource events.
2022/01/07 01:25:02 [INFO] [CORE] Not handling Ingress resource events.
2022/01/07 01:25:02 [INFO] [CORE] Registered BigIP Metrics
2022/01/07 01:25:03 [INFO] [2022-01-07 01:25:03,585 __main__ INFO] entering inotify loop to watch /tmp/k8s-bigip-ctlr.config334657854/config.json
2022/01/07 01:25:05 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:25:06 [INFO] [2022-01-07 01:25:06,589 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.1.4
2022/01/07 01:25:06 [INFO] [2022-01-07 01:25:06,731 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.1.5
2022/01/07 01:25:07 [INFO] [2022-01-07 01:25:07,664 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.1.5
2022/01/07 01:25:07 [INFO] [2022-01-07 01:25:07,737 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.1.4
2022/01/07 01:29:06 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:29:06 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.4's node.
2022/01/07 01:29:06 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:29:06 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.5's node.
2022/01/07 01:29:06 [INFO] [2022-01-07 01:29:06,480 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:29:07 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:29:07 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.5's node.
2022/01/07 01:29:07 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:29:07 [INFO] [2022-01-07 01:29:07,723 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1
2022/01/07 01:29:07 [INFO] [2022-01-07 01:29:07,952 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,342 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.1.4%0
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,420 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.1.5%0
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,702 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.247
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,767 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.246
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,826 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.1.4
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,894 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.1.5
2022/01/07 01:29:57 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:29:57 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.6's node.
2022/01/07 01:29:57 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:29:57 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.6's node.
2022/01/07 01:29:57 [INFO] [2022-01-07 01:29:57,478 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:29:58 [INFO] [2022-01-07 01:29:58,489 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:29:59 [INFO] [2022-01-07 01:29:59,025 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.246%0
2022/01/07 01:30:02 [INFO] [2022-01-07 01:30:02,954 f5_cccl.resource.resource INFO] Updating ApiFDBTunnel: /Common/flannel_vxlan
2022/01/07 01:30:06 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:30:06 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.7's node.
2022/01/07 01:30:06 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:30:06 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.7's node.
2022/01/07 01:30:06 [INFO] [2022-01-07 01:30:06,774 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1
2022/01/07 01:30:07 [INFO] [2022-01-07 01:30:07,722 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1
2022/01/07 01:30:08 [INFO] [2022-01-07 01:30:08,130 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.247%0
2022/01/07 01:31:36 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:31:37 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.7's node.
2022/01/07 01:31:37 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:31:37 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.6's node.
2022/01/07 01:31:37 [INFO] [2022-01-07 01:31:37,345 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1
2022/01/07 01:31:38 [INFO] [2022-01-07 01:31:38,490 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1
2022/01/07 01:31:39 [INFO] [2022-01-07 01:31:39,007 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.1.7%0
2022/01/07 01:31:50 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:31:50 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.6's node.
2022/01/07 01:31:50 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:31:50 [INFO] [2022-01-07 01:31:50,416 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:31:51 [INFO] [2022-01-07 01:31:51,401 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:31:51 [INFO] [2022-01-07 01:31:51,765 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.1.6%0
2022/01/07 01:31:52 [INFO] [2022-01-07 01:31:52,003 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.249
2022/01/07 01:31:52 [INFO] [2022-01-07 01:31:52,053 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.248
2022/01/07 01:31:52 [INFO] [2022-01-07 01:31:52,105 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.0.247
2022/01/07 01:31:52 [INFO] [2022-01-07 01:31:52,167 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.0.246
2022/01/07 01:32:05 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:32:05 [INFO] [2022-01-07 01:32:05,723 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:32:07 [INFO] [2022-01-07 01:32:07,175 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.250
2022/01/07 01:36:53 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:36:54 [INFO] [2022-01-07 01:36:54,201 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:36:55 [INFO] [2022-01-07 01:36:55,496 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.251
2022/01/07 01:37:05 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:37:05 [INFO] [2022-01-07 01:37:05,406 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:37:06 [INFO] [2022-01-07 01:37:06,698 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.252
2022/01/07 01:37:16 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:37:16 [INFO] [2022-01-07 01:37:16,493 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:37:17 [INFO] [2022-01-07 01:37:17,999 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.253
2022/01/07 01:37:27 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:37:28 [INFO] [2022-01-07 01:37:28,080 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:37:29 [INFO] [2022-01-07 01:37:29,410 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.254
2022/01/07 01:38:35 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:38:35 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.8's node.
2022/01/07 01:38:36 [INFO] [2022-01-07 01:38:36,166 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1
2022/01/07 01:38:38 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:38:38 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.8's node.
2022/01/07 01:38:39 [INFO] [2022-01-07 01:38:39,086 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1
2022/01/07 01:38:42 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:38:42 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:38:42 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:38:42 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:38:43 [INFO] [2022-01-07 01:38:43,260 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1
2022/01/07 01:38:44 [INFO] [2022-01-07 01:38:44,198 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1
2022/01/07 01:38:44 [INFO] [2022-01-07 01:38:44,605 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.248%0
2022/01/07 01:40:24 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:24 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:24 [INFO] [2022-01-07 01:40:24,964 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:40:25 [INFO] [2022-01-07 01:40:25,728 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.254%0
2022/01/07 01:40:35 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:35 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:35 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:35 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:35 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:36 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:36 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:36 [INFO] [2022-01-07 01:40:36,280 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:40:36 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:38 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:38 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:38 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:38 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:38 [INFO] [2022-01-07 01:40:38,897 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:40:39 [INFO] [2022-01-07 01:40:39,492 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.252%0
2022/01/07 01:40:39 [INFO] [2022-01-07 01:40:39,577 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.253%0
2022/01/07 01:40:40 [INFO] [2022-01-07 01:40:40,201 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:40:40 [INFO] [2022-01-07 01:40:40,580 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.251%0
2022/01/07 01:40:44 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:45 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:45 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:45 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:45 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:45 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:45 [INFO] [2022-01-07 01:40:45,531 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:40:46 [INFO] [2022-01-07 01:40:46,813 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
2022/01/07 01:40:47 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs
2022/01/07 01:40:47 [INFO] [2022-01-07 01:40:47,417 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.249%0
2022/01/07 01:40:47 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node.
2022/01/07 01:40:47 [INFO] [2022-01-07 01:40:47,539 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.250%0
2022/01/07 01:40:47 [INFO] [2022-01-07 01:40:47,976 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea
- View the pod IP
[root@cluster1-m1 1]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coffee-87b9987b4-lzch9 1/1 Running 0 3m58s 10.42.1.10 cluster1-w1 <none> <none>
coffee-87b9987b4-nmsk2 1/1 Running 0 4m6s 10.42.1.8 cluster1-w1 <none> <none>
coffee-87b9987b4-q778v 1/1 Running 0 4m2s 10.42.1.9 cluster1-w1 <none> <none>
tea-67977d68b-4qbcz 1/1 Running 0 117s 10.42.0.6 cluster1-m1 <none> <none>
tea-67977d68b-664f9 1/1 Running 0 116s 10.42.0.7 cluster1-m1 <none> <none>
tea-67977d68b-8xtwl 1/1 Running 0 2m8s 10.42.0.5 cluster1-m1 <none> <none>
tea-67977d68b-j2lb8 1/1 Running 0 114s 10.42.0.8 cluster1-m1 <none> <none>
tea-67977d68b-rxwkl 1/1 Running 0 2m8s 10.42.0.3 cluster1-m1 <none> <none>
tea-67977d68b-tsc6k 1/1 Running 0 2m8s 10.42.0.2 cluster1-m1 <none> <none>
- View the VE ARP list
and you see that the new pod ip did not update even the pod running in the normal and healthy worker node.
Expected Result
CIS outputs the error log, but nodepoller and arp process still works.
Actual Result
CIS outputs the error log, but nodepoller and arp process does not works.
Diagnostic Information
The CIS yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: k8s-bigip-ctlr1
name: cc-k8s-to-bigip1
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: k8s-bigip-ctlr1
template:
metadata:
labels:
app: k8s-bigip-ctlr1
name: k8s-bigip-ctlr1
spec:
containers:
- args:
- --bigip-username=$(BIGIP_USERNAME)
- --bigip-password=$(BIGIP_PASSWORD)
- --manage-ingress=false
- --bigip-partition=partition1
- --bigip-url=https://10.1.20.252
- --pool-member-type=cluster
- --flannel-name=/Common/flannel_vxlan
- --insecure=true
- --agent=cccl
command:
- /app/bin/k8s-bigip-ctlr
env:
- name: BIGIP_USERNAME
valueFrom:
secretKeyRef:
key: username
name: bigip-login1
optional: false
- name: BIGIP_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: bigip-login1
optional: false
image: f5networks/k8s-bigip-ctlr:2.7.0
imagePullPolicy: Always
name: k8s-bigip-ctlr1
serviceAccount: bigip-ctlr
serviceAccountName: bigip-ctlr
Observations (if any)
It may be related to the flannel bug “Flannel Annotations "flannel.alpha.coreos.com" issue #1122”
@kkfinkkfin - why do you want to manually remove the flannel annotation??
To reproduce the issue simulates the node loss vtepmac , we edit the node yaml file to remove the annotation flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"5a:de:e9:80:38:7e"}'
based on the flannel bug its an infra issue. Based on the CIS documentation, we suppose to have flannel config or other infra valid for proper functioning of CIS.
IMO, this issue is invalid.
@kkfinkkfin please share steps to reproduce this issue without manual intervention. This helps in understanding the usecase better.
@trinaths Maybe we can make a little enhancement. Let the pods that sit on the node that with correct flannel annotations to be updated into F5.
@trinaths Maybe we can make a little enhancement. Let the pods that sit on the node that with correct flannel annotations to be updated into F5.
@myf5 not sure what exactly you want achieve with this.
this will let CIS to be more robust, even only one node has wrong annotations, other normal nodes can still work with CIS. Thus will make customer's business to be continued.
@myf5 perfect!. please share steps to reproduce this issue without manual intervention. This helps in understanding the usecase better.
I can not understand why you keep asking reproduce a problem without manual intervention. As @kkfinkkfin said, the root cause of the flannel issue can not be reproduced always. But once the flannel issue happened, the result is there is wrong annotation on some nodes. Since the result of the flannel issue is consistent, why we can not mock the issue result? Is that so important to let flannel trigger the issue? Why CIS can not update other pods arp that running on the normal nodes?
@myf5 - Manual intervention to reproduce a problem doesn't help to properly fix an issue. It doesn't cover the real cause of this. When we reproduce an issue without manual intervention, it helps to clearly understand
- how often this issue is happening ?
- is this issue user infra related ?
- How clear and robust can CIS act on the issue ?
- Will the fix enhance or compromise UX?
- and many more....
Robustness of CIS is a vast space and we move in with priority and level of UX. Issues like this which are reproduced with manual intervention doesn't give complete picture of issue, rather just a part of it. Fix in CIS based on issues caused due to manual intervention doesn't scale well and may compromise UX.
The more the clear the issue (without manual intervention), the more robust the fix is.
CIS does not need care about how often the flannel trigger the issue. CIS should care about the result of the issue(the annotation errors). And take according actions to make CIS work in production level. There are too much things that can not find root cause, but it does not mean we just wait and do not take any actions to remedy it.
With my point of views: More customers is using CIS in this production environment, some of my customer had put it in financial trading system . CIS should be more care about the quality and performance seriously. More confidence from customers, more extend opportunities.
@myf5
I guess a quick hack/fix is to remove log.Errorf
and return
, use log.Infof ("[VxLAN] %v", err)
https://github.com/F5Networks/k8s-bigip-ctlr/blob/master/pkg/vxlan/vxlanMgr.go#L229-L233
var mac string
mac, err = getVtepMac(pod, kubePods, kubeNodes)
if nil != err {
log.Errorf("[VxLAN] %v", err)
return
}
@myf5
I guess a quick hack/fix is to remove
log.Errorf
andreturn
, uselog.Infof ("[VxLAN] %v", err)
https://github.com/F5Networks/k8s-bigip-ctlr/blob/master/pkg/vxlan/vxlanMgr.go#L229-L233
if only log informational message, it will result in pod ARP with empty MAC for the node missing flannel annotation, not sure if BIG-IP mcpd would accept pod ARP with empty MAC, and if BIG-IP mcpd accept pod empty MAC, the traffic to the pod will not work, maybe when node missing flannel annotation, we could use a fake MAC, when customer noticed traffic not working to the pod with fake MAC, they could check CIS log and we should have informational log about the node missing flannel annotation so customer will have clue to fix flannel issue.
@myf5 I guess a quick hack/fix is to remove
log.Errorf
andreturn
, uselog.Infof ("[VxLAN] %v", err)
https://github.com/F5Networks/k8s-bigip-ctlr/blob/master/pkg/vxlan/vxlanMgr.go#L229-L233if only log informational message, it will result in pod ARP with empty MAC for the node missing flannel annotation, not sure if BIG-IP mcpd would accept pod ARP with empty MAC, and if BIG-IP mcpd accept pod empty MAC, the traffic to the pod will not work, maybe when node missing flannel annotation, we could use a fake MAC, when customer noticed traffic not working to the pod with fake MAC, they could check CIS log and we should have informational log about the node missing flannel annotation so customer will have clue to fix flannel issue.
maybe, just ignore those pods on the wrong annotations nodes. And CIS has log to notice customers that there is wrong annotations on nodes, had ignored the pods, pls fix it. So customer can clearly get the reason.
@myf5 I guess a quick hack/fix is to remove
log.Errorf
andreturn
, uselog.Infof ("[VxLAN] %v", err)
https://github.com/F5Networks/k8s-bigip-ctlr/blob/master/pkg/vxlan/vxlanMgr.go#L229-L233if only log informational message, it will result in pod ARP with empty MAC for the node missing flannel annotation, not sure if BIG-IP mcpd would accept pod ARP with empty MAC, and if BIG-IP mcpd accept pod empty MAC, the traffic to the pod will not work, maybe when node missing flannel annotation, we could use a fake MAC, when customer noticed traffic not working to the pod with fake MAC, they could check CIS log and we should have informational log about the node missing flannel annotation so customer will have clue to fix flannel issue.
maybe, just ignore those pods on the wrong annotations nodes. And CIS has log to notice customers that there is wrong annotations on nodes, had ignored the pods, pls fix it. So customer can clearly get the reason.
yes, I like this idea and it is even simple to implement, following should do it, it is up to CIS team to fix it though, just giving suggestion here
mac, err = getVtepMac(pod, kubePods, kubeNodes)
if nil != err {
log.Infof("[VxLAN] %v", err)
continue
}
@myf5 Were you able to validate this workaround ? Is this suggestion appropriate ? Please share your findings.
yes, I like this idea and it is even simple to implement, following should do it, it is up to CIS team to fix it though, just giving suggestion here
mac, err = getVtepMac(pod, kubePods, kubeNodes) if nil != err { log.Infof("[VxLAN] %v", err) continue }
Created CONTCNTR-3139 for internal tracking.
Closing as no further discussion.