openyurt icon indicating copy to clipboard operation
openyurt copied to clipboard

分享节点池serviceTopology流量拓扑功能适配cilium-cni的问题及解决方案

Open rayne-Li opened this issue 11 months ago • 1 comments

What would you like to be added:

分享兼容cilium-cni的流量闭环方案

Why is this needed:

cilium的networkpolicy及流量控制能力是flannel不具有的

others /kind feature

特性

annotation Key annotation Value 说明
openyurt.io/topologyKeys kubernetes.io/hostname 流量被路由到相同的节点
openyurt.io/topologyKeys openyurt.io/nodepool 流量被路由到相同的节点池

参考文档: https://openyurt.io/zh/docs/user-manuals/network/service-topology

https://kubeedge.io/blog/enable-cilium/#kubeedge-edgecore-setup

openyurt版本: 1.5.0

os: debian12

k8s版本: 1.31

准备工作

  • 必要: k8s版本>1.18, 在1.21之后的版本endpointSlice被移除featureGate, 不需要特别处理
  • 配置kube-proxy使用in-cluster设置连接yurt-hub
$ kubectl edit cm -n kube-system kube-proxy
apiVersion: v1
data:
  config.conf: |-
    clientConnection:
      #kubeconfig: /var/lib/kube-proxy/kubeconfig.conf # 2. comment this line.
      qps: 0
    clusterCIDR: 10.244.0.0/16
    configSyncPeriod: 0s
  • 必要: 确认yurt-hub正常运行,
  • 必要: yurthub 组件依赖于 yurt-manager 来批准 csr
  • 必要: 创建节点池
$ cat << EOF | kubectl apply -f -
apiVersion: apps.openyurt.io/v1alpha1
kind: NodePool
metadata:
  name: fujian
spec:
  type: Cloud

---

apiVersion: apps.openyurt.io/v1alpha1
kind: NodePool
metadata:
  name: wuhan
spec:
  type: Edge

---

apiVersion: apps.openyurt.io/v1alpha1
kind: NodePool
metadata:
  name: wuqing
spec:
  type: Edge
EOF
  • 必要: 将节点加入到节点池中(通过打label的方式)
# kubectl get nodepool
NAME     TYPE    READYNODES   NOTREADYNODES   AGE
fujian   Cloud   2            0               7d21h
wuhan    Edge    2            0               7d21h
wuqing   Edge    2            0               7d21h

# kubectl get nb
NAME            NUM-NODES   AGE
fujian-7rxsj8   2           7d21h
wuhan1          2           7d17h
wuqing-cb6rvn   2           7d21h

创建测试工作负载

  • svc
$ cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  annotations:
    openyurt.io/topologyKeys: openyurt.io/nodepool
  labels:
    app: busy-box
  name: busy-box-svc
spec:
  ports:
  - port: 3000
    protocol: TCP
    targetPort: 3000
  selector:
    app: busy-box
  type: ClusterIP
EOF
  • yas
apiVersion: apps.openyurt.io/v1beta1
kind: YurtAppSet
metadata:
  name: example
  namespace: default
  resourceVersion: "4501951"
  uid: 16c5e569-366c-4fd9-b2e2-379cf8ce8317
spec:
  nodepoolSelector:
    matchLabels:
      yurtappset.openyurt.io/type: nginx
  pools:
  - wuhan
  - wuqing
  - fujian
  workload:
    workloadTemplate:
      deploymentTemplate:
        metadata:
          labels:
            app: busy-box
        spec:
          replicas: 2
          selector:
            matchLabels:
              app: busy-box
          template:
            metadata:
              labels:
                app: busy-box
            spec:
              containers:
              - command:
                - nc
                - -lk
                - -p
                - "3000"
                - -e
                - /bin/hostname
                - -i
                image: busybox
                imagePullPolicy: Always
                name: busy-box
                ports:
                - containerPort: 3000
                resources: {}

测试结果(不能实现流量闭环)

  • 确认缓存和iptables的宿主设置都正确
cat /etc/kubernetes/cache/kube-proxy/endpointslices.v1.discovery.k8s.io/default/busy-box-svc-7cgbp
... 
"endpoints":[
{"addresses":["192.168.3.45"],"conditions":{"ready":true,"serving":true,"terminating":false},"targetRef":{"kind":"Pod","namespace":"default","name":"example-wuqing-pd9fn-88589f6bf-hqdqd","uid":"2c1388df-847e-4087-ab56-a4a8351d8ab8"},"nodeName":"tj-wq2-lzytest-0002"},
{"addresses":["192.168.4.192"],"conditions":{"ready":true,"serving":true,"terminating":false},"targetRef":{"kind":"Pod","namespace":"default","name":"example-wuqing-pd9fn-88589f6bf-gr9g2","uid":"a0b07a2a-fe14-4068-b299-483c8a7a477f"},"nodeName":"tj-wq2-lzytest-0001"}]

KUBE-SVC-PZIRA6MO24RJXLWV  tcp  --  anywhere             10.103.185.179       /* default/busy-box-svc cluster IP */ tcp dpt:3000

Chain KUBE-SVC-PZIRA6MO24RJXLWV (1 references)
target     prot opt source               destination
KUBE-MARK-MASQ  tcp  -- !192.168.0.0/16       10.103.185.179       /* default/busy-box-svc cluster IP */ tcp dpt:3000
KUBE-SEP-IQNXXFW6DAPHIZPB  all  --  anywhere             anywhere             /* default/busy-box-svc -> 192.168.3.45:3000 */ statistic mode random probability 0.50000000000
KUBE-SEP-3AAZEJEAEB3WZLEG  all  --  anywhere             anywhere             /* default/busy-box-svc -> 192.168.4.192:3000 */
  • 从宿主上telnet clusterIP实现流量闭环没有问题, 只会连接到节点池内的pod, 而在容器里不可以
telnet 10.103.185.179 3000
Trying 10.103.185.179...
Connected to 10.103.185.179.
Escape character is '^]'.
192.168.3.45

telnet 10.103.185.179 3000
Trying 10.103.185.179...
Connected to 10.103.185.179.
Escape character is '^]'.
192.168.4.192
Connection closed by foreign host.
  • 猜测是因为cilium的原因, 将clusterIP流量劫持了, 直接通过ebpf转发, 而不是iptables规则, 即使开启了kube-proxy也不可以,
https://github.com/cilium/cilium/issues/28904#issuecomment-1804545547
  Services:
  - ClusterIP:      Enabled
  - NodePort:       Disabled
  - LoadBalancer:   Disabled
  - externalIPs:    Disabled
  - HostPort:       Disabled

kubectl -n kube-system exec ds/cilium -- cilium-dbg service list
21   10.103.185.179:3000    ClusterIP      1 => 192.168.5.235:3000 (active)
                                           2 => 192.168.3.45:3000 (active)
                                           3 => 192.168.0.170:3000 (active)
                                           4 => 192.168.2.181:3000 (active)
                                           5 => 192.168.1.160:3000 (active)
                                           6 => 192.168.4.192:3000 (active)
  • 经测试, --set loadBalancer.serviceTopology=true调整cilium参数也无效, 因为底层还是基于endpointSlice运行

解决方案

  • 因为kube-edge社区支持cilium, 所以参考他们的方案发现了需要单独部署cilium和cilium-edge, cilium-edge需要连接yurt-hub
https://kubeedge.io/blog/enable-cilium/#kubeedge-edgecore-setup

### Dump original Cilium DaemonSet configuration  
> kubectl get ds -n kube-system cilium -o yaml > cilium-edgecore.yaml  
  
### Edit and apply the following patch  
> vi cilium-edgecore.yaml  
  
### Deploy cilium-agent aligns with edgecore  
> kubectl apply -f cilium-edgecore.yaml


diff --git a/cilium-edgecore.yaml b/cilium-edgecore.yaml
index bff0f0b..3d941d1 100644
--- a/cilium-edgecore.yaml
+++ b/cilium-edgecore.yaml
@@ -8,7 +8,7 @@ metadata:
     app.kubernetes.io/name: cilium-agent
     app.kubernetes.io/part-of: cilium
     k8s-app: cilium
-  name: cilium
+  name: cilium-kubeedge
   namespace: kube-system
 spec:
   revisionHistoryLimit: 10
@@ -29,6 +29,12 @@ spec:
         k8s-app: cilium
     spec:
       affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+              - matchExpressions:
+                - key: node-role.kubernetes.io/edge
+                  operator: Exists
         podAntiAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
           - labelSelector:
@@ -39,6 +45,8 @@ spec:
       containers:
       - args:
         - --config-dir=/tmp/cilium/config-map
+        - --k8s-api-server=127.0.0.1:10550
+        - --auto-create-cilium-node-resource=true
         - --debug
         command:
         - cilium-agent
@@ -178,7 +186,9 @@ spec:
       dnsPolicy: ClusterFirst
       hostNetwork: true
       initContainers:
-      - command:
+      - args:
+        - --k8s-api-server=127.0.0.1:10550
+        command:
         - cilium
         - build-config
         env:
  • 参考上面的改动对cilium进行cilium-edge改造
kubectl get ds -n kube-system cilium -o yaml > cilium-edgecore.yaml  

1. 改变cilium的env, 使其连接yurt-hub, 10268端口为https端口
# from
        env:
        - name: KUBERNETES_SERVICE_HOST
           value: {{APISERVER_EXTERNAL_IP}}
        - name: KUBERNETES_SERVICE_PORT
           value: "6443"
# to (每一个contianer都要改)
        env:
        - name: KUBERNETES_SERVICE_HOST
          value: 169.254.2.1
        - name: KUBERNETES_SERVICE_PORT
          value: "10268"

# 偷懒是人类进步的阶梯(我说的)
sed -i '/- name: KUBERNETES_SERVICE_HOST/{n; s/value:.*/value: 169.254.2.1/;}' 3.yaml
sed -i '/- name: KUBERNETES_SERVICE_PORT/{n; s/value:.*/value: "10268"/;}' 3.yaml


2. 改变deployment的名称
- name: cilium
+ name: cilium-edge

3. (可跳过)改变部分container的启动参数, 应该是env的优先级更高, 可以二选一改动
    containers:
    - args:
      - --auto-create-cilium-node-resource=true
    initContainers:
    - command:
      - cilium-dbg
      - build-config
      - k8s-api-server=http://127.0.0.1:10261
    
4. 设置亲和性, cilium-edge只调度到边缘节点, cilium只调度到云节点
# 边 cilium-edge
      nodeSelector:
        kubernetes.io/os: linux
+       openyurt.io/is-edge-worker: "true"
# 云 cilium
      nodeSelector:
        kubernetes.io/os: linux
+       openyurt.io/is-edge-worker: "false"

  • 社区ranbom-ch提供了一种思路, 使用yurt-hub的data filter的过滤功能
你看下这个文档:[https://openyurt.io/zh/docs/user-manuals/resource-access-control/](https://openyurt.io/zh/docs/user-manuals/resource-access-control/)  
配置好之后,重启一下cilium就可以了。

kubectl -n kube-system get cm yurt-hub-cfg -o yaml
apiVersion: v1
data:
  cache_agents: ""
  discardcloudservice: ""
  masterservice: ""
+  servicetopology: cilium,cilium-agent
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: yurthub
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2024-12-27T02:20:08Z"
  labels:
    app.kubernetes.io/instance: yurthub
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: yurthub
    app.kubernetes.io/version: v1.5.0
    helm.sh/chart: yurthub-1.5.0
  name: yurt-hub-cfg
  namespace: kube-system
  • 改动完成后重启cilium可以看到cilium获得了对应节点池的endpoint
17   10.100.140.155:3000    ClusterIP      1 => 192.168.3.37:3000 (active)
                                           2 => 192.168.4.244:3000 (active)
17   10.100.140.155:3000    ClusterIP      1 => 192.168.0.189:3000 (active)
                                           2 => 192.168.5.46:3000 (active)
# master节点由于没有安装yurt-hub, 因此还是可以看到所有的ep
ID   Frontend               Service Type   Backend
1    10.100.140.155:3000    ClusterIP      1 => 192.168.3.251:3000 (active)
                                           2 => 192.168.2.196:3000 (active)
                                           3 => 192.168.5.85:3000 (active)
                                           4 => 192.168.4.202:3000 (active)
                                           5 => 192.168.0.119:3000 (active)
                                           6 => 192.168.1.102:3000 (active)
  • 进入容器telnet 也可以获得对应的解析效果
kubectl exec -it example-wuqing-pd9fn-88589f6bf-58x7b -- telnet  10.100.140.155 3000
Connected to 10.100.140.155
192.168.4.244
Connection closed by foreign host
command terminated with exit code 1

kubectl exec -it example-wuqing-pd9fn-88589f6bf-58x7b -- telnet  10.100.140.155 3000
Connected to 10.100.140.155
192.168.3.37
Connection closed by foreign host
command terminated with exit code 1

rayne-Li avatar Jan 14 '25 06:01 rayne-Li

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 11 '25 21:06 stale[bot]