81-add-worker后，原来的老节点不能使用TCP协议访问新节点POD

Open wurenny opened this issue 9 months ago • 0 comments

缺陷描述

使用81-add-worker扩容两个节点后：

新节点主机上可以访问老节点POD，新节点POD里面也可以访问老节点POD
老节点主机上不能访问新节点POD，老节点POD里面也不能访问新节点POD
老节点主机上不能通过curl访问新节点的ing controller
新节点主机上可以访问当前节点上的POD
两边ICMP不管主机还是POD都可以互通没问题，只有TCP有问题
两个新节点都有相同的问题
扩容了两个集群，可以完全复现该问题

初步排查结果

新节点flannal安装正确，配置文件正确，子网分配成功，cni netns及veth接口正常
vxlan设备和cni bridge正常，各节点的mtu一致
ip route、ip neigh、bridge fdb表、arp表未发现异常
tcpdump：
- tcp包从老节点发udp->新节点flannel正常接收udp->vxlan正常解包
- 新节点vxlan设备上可以看到tcp S包->但没有传递到cni bridge上，通信到这里中止，tcp不能继续握手
查了sysctl和iptables和老节点对比，未发现异常
查了新节点iptables未有任何drop/reject流量记录
查了Google上大多说是iptables没有forward accept，看了各个节点没有这个问题
查了新节点kubelet log未发现什么异常

环境 (请填写以下信息):

执行下面括号中的命令，提交返回结果

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 3.10.0-1160.15.2.el7.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Ansible版本 (ansible --version):

ansible 2.7.5
  config file = /home/tempuser/install/kubeadm-ha/ansible.cfg
  configured module search path = ['/home/tempuser/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.6.8 (default, Nov 16 2020, 16:55:22) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]

Python版本 (python --version):

Python 2.7.5

Kubeadm-ha版本(commit) (git rev-parse --short HEAD):

# 比较老的一个集群，当时安装用的kubeadm-ha版本也比较老，现有扩容需求
$ git rev-parse --short HEAD
1fa9622

$ git log -1
commit 1fa962253cb50d55597ac041618ecc17fe6d9fc7
Author: ChongmingDu <[email protected]>
Date:   Sat Jul 31 01:07:16 2021 +0800

    fs.inotify values were added to sysctl

目标kube版本

# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-06T05:17:59Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-06T05:09:48Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

目标docker及containerd版本

# docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:58:10 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:56:35 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

目标flannel版本

# kubectl -n kube-system get ds kube-flannel-ds -o jsonpath='{range .spec.template.spec}{.containers[].image}{"\n"}{.initContainers[].image}{"\n"}{end}'
registry.aliyuncs.com/kubeadm-ha/coreos_flannel:v0.12.0
registry.aliyuncs.com/kubeadm-ha/coreos_flannel:v0.12.0

如何复现

复现的步骤：

在原有的inventory基础上，向[all] [kube-worker] [new-worker]中增加新的两个节点
执行部署命令，命令如下

ansible-playbook -i inventory-test.ini -e @variables.yaml 81-add-worker.yml

两个集群扩容后，可以100%复现相同的问题
出现错误：扩容过程无报错

其他事项

问题有点古怪，没找到vxlan未按即定路由转发tcp至cni bridge的原因

May 19 '24 04:05 wurenny

kubeadm-ha kubeadm-ha copied to clipboard

81-add-worker后，原来的老节点不能使用TCP协议访问新节点POD

缺陷描述

初步排查结果

环境 (请填写以下信息):

如何复现

其他事项

kubeadm-ha
kubeadm-ha copied to clipboard