k8s-bigip-ctlr icon indicating copy to clipboard operation
k8s-bigip-ctlr copied to clipboard

Openshift SDN integration fails in VxLan manager

Open bukovjanmic opened this issue 2 years ago • 8 comments

Setup Details

CIS Version : 2.8.1
Build: f5networks/k8s-bigip-ctlr:latest
BIGIP Version: 15.1.2 AS3 Version: Big IP 3.32.0-4 Agent Mode: AS3
Orchestration: OSCP
Orchestration Version: 4.9.24 Pool Mode: Cluster Additional Setup details: <Platform/CNI Plugins/ cluster nodes/ etc>


When creating VirtualServers or TransportServers, VxLan integration does not seem to work.

In operator logs, we see error messages:

2022/04/25 15:30:30 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for's node.

looking at the source code, it seems the VxLAN manager is looking for flannel.alpha.coreos.com/public-ip annotation in node where the pod is running.

Howerer, no such annotation is present on the nodes, as we run openshift sdn in networkpolicy mode.

Our setup:

apiVersion: cis.f5.com/v1
kind: F5BigIpCtlr
    operator-sdk/primary-resource: kube-system/f5-server-f5-bigip-ctlr
    operator-sdk/primary-resource-type: Deployment.apps
  name: f5-server
  namespace: openshift-operators
    - helm.sdk.operatorframework.io/uninstall-release
    manage_routes: false
    agent: as3
    custom-resource-mode: true
    log_level: info
    openshift-sdn-name: /occ01/occ01-tunnel
    bigip_partition: occ01
    ipam: true
    default-route-domain: 14
    disable-teems: true
    log_as3_response: true
    insecure: true
    pool-member-type: cluster
  bigip_login_secret: bigip
    pullPolicy: Always
    repo: k8s-bigip-ctlr
    user: f5networks
  namespace: kube-system
    create: true
  resources: {}
    create: true
  version: latest

In NodePort mode, the integration seems to work.

Documentation does not specify that Flannel is a requirement.

Could you advise us, what is the problem?


Michal Bukovjan

bukovjanmic avatar Apr 25 '22 15:04 bukovjanmic

@bukovjanmic If I recall correct, the controller does not require static pod MAC for openshift environment, the error log you pointed out only apply to k8s/flannel environment, openshift pod MAC is resolved through dynamic ARP request/response resolution.

vincentmli avatar Apr 26 '22 16:04 vincentmli

Thanks @vincentmli you are correct.

@bukovjanmic can you check a few things. First check the VXLAN policy within BIG-IP. Mac sure you see FBD entries.

If not please review CIS VXLAN troubleshooting https://support.f5.com/csp/article/K43473164

Here are some good user-guides and demos

OpenShift 4.7 and F5 Container Ingress Services (CIS) User-Guide for Standalone BIG-IP https://github.com/mdditt2000/k8s-bigip-ctlr/blob/main/user_guides/openshift-4-7/standalone/README.md

OpenShift 4.7 and F5 Container Ingress Services (CIS) User-Guide for BIG-IP Cluster https://github.com/mdditt2000/k8s-bigip-ctlr/blob/main/user_guides/openshift-4-7/cluster/README.md

Let me know if you have any questions

mdditt2000 avatar Apr 26 '22 16:04 mdditt2000

I agree, this is what we find strange, We triple-checked the guides above (we use the BIG-IP Cluster version), and everything seems to be correct,, we see VxLAN on the Big-IP:

(cfg-sync In Sync)(Active)(/occ01)(tmos)# show net fdb tunnel occ01-tunnel

Tunnel        Mac Address        Member                    Dynamic
occ01-tunnel  0a:0a:ac:1f:08:72  endpoint:  no
occ01-tunnel  0a:0a:ac:1f:08:76  endpoint:  no
occ01-tunnel  0a:0a:ac:1f:08:14  endpoint:   no
occ01-tunnel  0a:0a:ac:1f:08:15  endpoint:   no
occ01-tunnel  0a:0a:ac:1f:08:16  endpoint:   no
occ01-tunnel  0a:0a:ac:1f:08:17  endpoint:   no
occ01-tunnel  0a:0a:ac:1f:08:18  endpoint:   no
occ01-tunnel  0a:0a:ac:1f:08:19  endpoint:   no
occ01-tunnel  0a:0a:ac:1f:08:1a  endpoint:   no
occ01-tunnel  0a:0a:ac:1f:08:1b  endpoint:   no
occ01-tunnel  0a:0a:ac:1f:08:1c  endpoint:   no

but still get the above error in the f5-server-f5-bigip-ctlr pod on Openshift side and TransportServers/VirtualServers are not set up in cluster mode.

What we will try:

  • re-setup this with the occ01 profile defined in Common partition
  • upgrade Big-IP to version 16

It may also be possibe the the route domain 14 could be a problem?

bukovjanmic avatar May 02 '22 13:05 bukovjanmic

What we will try:

  • re-setup this with the occ01 profile defined in Common partition
  • upgrade Big-IP to version 16

It may also be possibe the the route domain 14 could be a problem?

@bukovjanmic yes it is worth trying. a side note, I have been working on Cilium CNI VXLAN integration with BIG-IP for the past year and it should work much better and much less configuration overhead, The Cilium VXLAN integration feature release v1.12.0 will be around June this year, Openshift supports Cilium CNI too.

vincentmli avatar May 04 '22 14:05 vincentmli

@bukovjanmic @mdditt2000 I actually could get same error while testing Kubernetes Cilium CNI VXLAN integration with BIG-IP and CIS 2.8.1, in Cilium, static POD arp is not required. CIS logs the error, but appears does not affect the network connectivity in my case, does the error log affect your network connectivities ?

vincentmli avatar May 04 '22 21:05 vincentmli

after I changed my lab CIS argument from --flannel_vxlan to --openshift-sdn-name, the error disappeared, @bukovjanmic could you check if you are using --openshift-sdn-name argument?

vincentmli avatar May 04 '22 21:05 vincentmli

Yes, I am using openshift-sdn-name arg in the operator CR and no flannel at all.

bukovjanmic avatar May 09 '22 06:05 bukovjanmic

@bukovjanmic can you provide the CIS log?

vincentmli avatar May 16 '22 15:05 vincentmli

No response from the user. Recommend use OCP+OVNK8S CNI with CIS.

trinaths avatar Feb 06 '23 15:02 trinaths