openyurt
openyurt copied to clipboard
[BUG]kubevirt deployed VM coulnd't be restarted when network is disconnected
What happened: This is a use case of using kubevirt + OpenYurt.
On the worker node, we use kubevirt to have a VM deployed , which is connected with master node successfully. Then we disconnect the network, and reboot the worker node. The problem is that the VM deployed before couln't be started.
What you expected to happen: The deployed VM could be restarted even on network disconnection.
How to reproduce it (as minimally and precisely as possible):
- deploy OpenYurt cluster with one worker node supporting virtulization.
- deploy kubevirt.
- deploy a VM on worker node.
- disconnect network.
- reboot the worker node and see if VM deployed could run.
Anything else we need to know?: We think this could be a cloud edge collaboration issue in OpenYurt's concern. So we hope OpenYurt community could solve this issue. :)
Environment:
- OpenYurt version: 1.2
- Kubernetes version (use
kubectl version
): 1.22 - OS (e.g:
cat /etc/os-release
): N/A - Kernel (e.g.
uname -a
): N/A - Install tools: N/A
- Others:
others
/kind bug
@gnunu would you be able to upload the detail logs of yurthub component and kubelet component?
@gnunu would you be able to upload the detail logs of yurthub component and kubelet component?
details should be uploaded a little alter by my colleagues.
two nodes cluster, one control-plane (cloud) node, and one worker (edge) node
- cloud node: joez-hce-ub20-vm-virt-m
- edge node: joez-hce-ub20-vm-virt-w
version of the key components (all the nodes are the same):
box@joez-hce-ub20-vm-virt-m:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
box@joez-hce-ub20-vm-virt-m:~$ uname -a
Linux joez-hce-ub20-vm-virt-m 5.4.0-147-generic #164-Ubuntu SMP Tue Mar 21 14:23:17 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
box@joez-hce-ub20-vm-virt-m:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:09:57Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
box@joez-hce-ub20-vm-virt-m:~$ docker version
Client: Docker Engine - Community
Version: 23.0.4
API version: 1.42
Go version: go1.19.8
Git commit: f480fb1
Built: Fri Apr 14 10:32:23 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 23.0.4
API version: 1.42 (minimum version 1.12)
Go version: go1.19.8
Git commit: cbce331
Built: Fri Apr 14 10:32:23 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.20
GitCommit: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
runc:
Version: 1.1.5
GitCommit: v1.1.5-0-gf19387a
docker-init:
Version: 0.19.0
GitCommit: de40ad0
box@joez-hce-ub20-vm-virt-m:~$ virtctl version
Client Version: version.Info{GitVersion:"v0.58.0", GitCommit:"6e41ae7787c1b48ac9a633c61a54444ea947242c", GitTreeState:"clean", BuildDate:"2022-10-13T00:33:22Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{GitVersion:"v0.58.0", GitCommit:"6e41ae7787c1b48ac9a633c61a54444ea947242c", GitTreeState:"clean", BuildDate:"2022-10-13T00:33:22Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
before shutdown cloud node and reboot edge node, everything works fine:
box@joez-hce-ub20-vm-virt-m:~$ kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-85b98978db-hvn2z 1/1 Running 1 (20m ago) 68m
default virt-launcher-testvm-7hxv5 2/2 Running 0 34s
kube-flannel kube-flannel-ds-dkxk2 1/1 Running 1 (20m ago) 28h
kube-flannel kube-flannel-ds-xh79b 1/1 Running 1 (20m ago) 28h
kube-system coredns-6d8c4cb4d-67l78 1/1 Running 1 (20m ago) 28h
kube-system coredns-6d8c4cb4d-mdhrt 1/1 Running 1 (7m6s ago) 28h
kube-system etcd-joez-hce-ub20-vm-virt-m 1/1 Running 1 (7m6s ago) 28h
kube-system kube-apiserver-joez-hce-ub20-vm-virt-m 1/1 Running 2 (20m ago) 61m
kube-system kube-controller-manager-joez-hce-ub20-vm-virt-m 1/1 Running 2 (20m ago) 28h
kube-system kube-proxy-jph4x 1/1 Running 1 (20m ago) 28h
kube-system kube-proxy-sqqck 1/1 Running 1 (20m ago) 28h
kube-system kube-scheduler-joez-hce-ub20-vm-virt-m 1/1 Running 2 (20m ago) 28h
kube-system yurt-app-manager-b8677d956-4b9pf 1/1 Running 6 (20m ago) 27h
kube-system yurt-controller-manager-7787f67564-jmjcb 1/1 Running 2 (7m6s ago) 3h3m
kube-system yurt-hub-joez-hce-ub20-vm-virt-w 1/1 Running 1 (20m ago) 143m
kubevirt virt-api-69d978dd67-rp8np 1/1 Running 1 (20m ago) 37m
kubevirt virt-api-69d978dd67-t4552 1/1 Running 1 (20m ago) 37m
kubevirt virt-controller-695cc98c56-fkzsx 1/1 Running 1 (7m6s ago) 37m
kubevirt virt-controller-695cc98c56-j4wxv 1/1 Running 1 (20m ago) 37m
kubevirt virt-handler-q5sqh 1/1 Running 1 (20m ago) 37m
kubevirt virt-handler-wdtp4 1/1 Running 1 (7m6s ago) 37m
kubevirt virt-operator-58cb8475bb-6mswb 1/1 Running 1 (20m ago) 38m
kubevirt virt-operator-58cb8475bb-t74df 1/1 Running 1 (20m ago) 38m
box@joez-hce-ub20-vm-virt-w:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
149fa086b94e quay.io/kubevirt/cirros-container-disk-demo "/usr/bin/container-…" About a minute ago Up About a minute k8s_volumecontainerdisk_virt-launcher-testvm-7hxv5_default_1d656c94-1395-4deb-89f8-0f844d989e52_0
cfdbd721ece4 a3a2b8b0c675 "/usr/bin/virt-launc…" About a minute ago Up About a minute k8s_compute_virt-launcher-testvm-7hxv5_default_1d656c94-1395-4deb-89f8-0f844d989e52_0
0b81204195b8 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" About a minute ago Up About a minute k8s_POD_virt-launcher-testvm-7hxv5_default_1d656c94-1395-4deb-89f8-0f844d989e52_0
85951e033c26 nginx "/docker-entrypoint.…" 7 minutes ago Up 7 minutes k8s_nginx_nginx-85b98978db-hvn2z_default_b91fe8a0-253e-42e4-843f-199967d87a9d_1
d035475e97e4 c407633b131b "virt-handler --port…" 7 minutes ago Up 7 minutes k8s_virt-handler_virt-handler-q5sqh_kubevirt_e2d38ee6-80f3-4360-8321-cbf3b40d1985_1
ed2063dcca41 f76a3af5e135 "virt-controller --l…" 7 minutes ago Up 7 minutes k8s_virt-controller_virt-controller-695cc98c56-j4wxv_kubevirt_1021a9cd-e233-470e-8ca3-4979315c31a4_1
90ec4ddecc9f registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 7 minutes ago Up 7 minutes k8s_POD_virt-controller-695cc98c56-j4wxv_kubevirt_1021a9cd-e233-470e-8ca3-4979315c31a4_4
fdd690ab00f0 a7186007b4a9 "/usr/local/bin/yurt…" 7 minutes ago Up 7 minutes k8s_yurt-app-manager_yurt-app-manager-b8677d956-4b9pf_kube-system_8a4afc07-7d42-4e64-b0b9-27b344eec936_6
7d2562e58b79 e05304a0fbaf "virt-operator --por…" 7 minutes ago Up 7 minutes k8s_virt-operator_virt-operator-58cb8475bb-6mswb_kubevirt_46eb2665-d9b0-4c51-9827-f87ac1ab8985_1
69852e3414e7 943b496a674d "virt-api --port 844…" 7 minutes ago Up 7 minutes k8s_virt-api_virt-api-69d978dd67-rp8np_kubevirt_634e5490-d783-4fdb-ba11-3bf1558b37ae_1
c77366a0ac41 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 7 minutes ago Up 7 minutes k8s_POD_yurt-app-manager-b8677d956-4b9pf_kube-system_8a4afc07-7d42-4e64-b0b9-27b344eec936_3
fbaedb517e8b registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 7 minutes ago Up 7 minutes k8s_POD_virt-handler-q5sqh_kubevirt_e2d38ee6-80f3-4360-8321-cbf3b40d1985_4
3ebdbf1da63c registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 7 minutes ago Up 7 minutes k8s_POD_virt-operator-58cb8475bb-6mswb_kubevirt_46eb2665-d9b0-4c51-9827-f87ac1ab8985_4
6009f65b3444 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 7 minutes ago Up 7 minutes k8s_POD_nginx-85b98978db-hvn2z_default_b91fe8a0-253e-42e4-843f-199967d87a9d_3
bb946f591b78 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 7 minutes ago Up 7 minutes k8s_POD_virt-api-69d978dd67-rp8np_kubevirt_634e5490-d783-4fdb-ba11-3bf1558b37ae_3
702909f7a174 11ae74319a21 "/opt/bin/flanneld -…" 7 minutes ago Up 7 minutes k8s_kube-flannel_kube-flannel-ds-dkxk2_kube-flannel_37eb9f49-5338-4fb6-bd97-563d0ff098be_1
3a8b0a92ff21 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 7 minutes ago Up 7 minutes k8s_POD_kube-flannel-ds-dkxk2_kube-flannel_37eb9f49-5338-4fb6-bd97-563d0ff098be_1
9c92d6fdedde e03484a90585 "/usr/local/bin/kube…" 7 minutes ago Up 7 minutes k8s_kube-proxy_kube-proxy-sqqck_kube-system_90fecc3f-31b1-4ba6-a825-1c0fa2db64d6_1
7acff52536ea registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 7 minutes ago Up 7 minutes k8s_POD_kube-proxy-sqqck_kube-system_90fecc3f-31b1-4ba6-a825-1c0fa2db64d6_1
b0b78516b422 f4fba699ab86 "yurthub --v=2 --ser…" 20 minutes ago Up 20 minutes k8s_yurt-hub_yurt-hub-joez-hce-ub20-vm-virt-w_kube-system_21482483ffe45101b48a34a036517322_1
5d79eea5b086 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 20 minutes ago Up 20 minutes k8s_POD_yurt-hub-joez-hce-ub20-vm-virt-w_kube-system_21482483ffe45101b48a34a036517322_1
then, shutdown cloud node
box@joez-hce-ub20-vm-virt-m:~$ sudo shutdown now
Connection to joez-hce-ub20-vm-virt-m closed by remote host.
Connection to joez-hce-ub20-vm-virt-m closed.
wait for more than 1 minute, both nginx and kubevirt vm workload are still running on edge node
box@joez-hce-ub20-vm-virt-w:~$ ps -ef | grep qemu
root 12371 12346 0 12:12 ? 00:00:00 /usr/bin/virt-launcher-monitor --qemu-timeout 241s --name testvm --uid e67efffd-d2d1-464a-8d3f-9ae347bd9c60 --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/OVMF --keep-after-failure
root 12390 12371 0 12:12 ? 00:00:00 /usr/bin/virt-launcher --qemu-timeout 241s --name testvm --uid e67efffd-d2d1-464a-8d3f-9ae347bd9c60 --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/OVMF
uuidd 12635 12371 3 12:12 ? 00:00:10 /usr/libexec/qemu-kvm -name guest=default_testvm,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-default_testvm/master-key.aes"}
box@joez-hce-ub20-vm-virt-w:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0ba2f3224321 f76a3af5e135 "virt-controller --l…" 8 seconds ago Up 7 seconds k8s_virt-controller_virt-controller-695cc98c56-j4wxv_kubevirt_1021a9cd-e233-470e-8ca3-4979315c31a4_3
981719c7d9ae c407633b131b "virt-handler --port…" 15 seconds ago Up 14 seconds k8s_virt-handler_virt-handler-q5sqh_kubevirt_e2d38ee6-80f3-4360-8321-cbf3b40d1985_2
149fa086b94e quay.io/kubevirt/cirros-container-disk-demo "/usr/bin/container-…" 6 minutes ago Up 6 minutes k8s_volumecontainerdisk_virt-launcher-testvm-7hxv5_default_1d656c94-1395-4deb-89f8-0f844d989e52_0
cfdbd721ece4 a3a2b8b0c675 "/usr/bin/virt-launc…" 6 minutes ago Up 6 minutes k8s_compute_virt-launcher-testvm-7hxv5_default_1d656c94-1395-4deb-89f8-0f844d989e52_0
0b81204195b8 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 6 minutes ago Up 6 minutes k8s_POD_virt-launcher-testvm-7hxv5_default_1d656c94-1395-4deb-89f8-0f844d989e52_0
85951e033c26 nginx "/docker-entrypoint.…" 12 minutes ago Up 12 minutes k8s_nginx_nginx-85b98978db-hvn2z_default_b91fe8a0-253e-42e4-843f-199967d87a9d_1
90ec4ddecc9f registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 12 minutes ago Up 12 minutes k8s_POD_virt-controller-695cc98c56-j4wxv_kubevirt_1021a9cd-e233-470e-8ca3-4979315c31a4_4
69852e3414e7 943b496a674d "virt-api --port 844…" 12 minutes ago Up 12 minutes k8s_virt-api_virt-api-69d978dd67-rp8np_kubevirt_634e5490-d783-4fdb-ba11-3bf1558b37ae_1
c77366a0ac41 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 12 minutes ago Up 12 minutes k8s_POD_yurt-app-manager-b8677d956-4b9pf_kube-system_8a4afc07-7d42-4e64-b0b9-27b344eec936_3
fbaedb517e8b registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 12 minutes ago Up 12 minutes k8s_POD_virt-handler-q5sqh_kubevirt_e2d38ee6-80f3-4360-8321-cbf3b40d1985_4
3ebdbf1da63c registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 12 minutes ago Up 12 minutes k8s_POD_virt-operator-58cb8475bb-6mswb_kubevirt_46eb2665-d9b0-4c51-9827-f87ac1ab8985_4
6009f65b3444 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 12 minutes ago Up 12 minutes k8s_POD_nginx-85b98978db-hvn2z_default_b91fe8a0-253e-42e4-843f-199967d87a9d_3
bb946f591b78 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 12 minutes ago Up 12 minutes k8s_POD_virt-api-69d978dd67-rp8np_kubevirt_634e5490-d783-4fdb-ba11-3bf1558b37ae_3
702909f7a174 11ae74319a21 "/opt/bin/flanneld -…" 12 minutes ago Up 12 minutes k8s_kube-flannel_kube-flannel-ds-dkxk2_kube-flannel_37eb9f49-5338-4fb6-bd97-563d0ff098be_1
3a8b0a92ff21 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 12 minutes ago Up 12 minutes k8s_POD_kube-flannel-ds-dkxk2_kube-flannel_37eb9f49-5338-4fb6-bd97-563d0ff098be_1
9c92d6fdedde e03484a90585 "/usr/local/bin/kube…" 12 minutes ago Up 12 minutes k8s_kube-proxy_kube-proxy-sqqck_kube-system_90fecc3f-31b1-4ba6-a825-1c0fa2db64d6_1
7acff52536ea registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 12 minutes ago Up 12 minutes k8s_POD_kube-proxy-sqqck_kube-system_90fecc3f-31b1-4ba6-a825-1c0fa2db64d6_1
b0b78516b422 f4fba699ab86 "yurthub --v=2 --ser…" 25 minutes ago Up 25 minutes k8s_yurt-hub_yurt-hub-joez-hce-ub20-vm-virt-w_kube-system_21482483ffe45101b48a34a036517322_1
5d79eea5b086 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 25 minutes ago Up 25 minutes k8s_POD_yurt-hub-joez-hce-ub20-vm-virt-w_kube-system_21482483ffe45101b48a34a036517322_1
now, retart edge node, and keep cloud node down
box@joez-hce-ub20-vm-virt-w:~$ sudo reboot
[sudo] password for box:
Connection to 10.67.108.242 closed by remote host.
Connection to 10.67.108.242 closed.
after reboot, both the nginx and kubevirt vm are not launched
box@joez-hce-ub20-vm-virt-w:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9a5b19370a11 f4fba699ab86 "yurthub --v=2 --ser…" 13 minutes ago Up 13 minutes k8s_yurt-hub_yurt-hub-joez-hce-ub20-vm-virt-w_kube-system_21482483ffe45101b48a34a036517322_2
cea81f7a1578 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 13 minutes ago Up 13 minutes k8s_POD_yurt-hub-joez-hce-ub20-vm-virt-w_kube-system_21482483ffe45101b48a34a036517322_2
Here are my steps to setup OpenYurt cluster and deploy KubeVirt:
label nodes and activate node autonomy:
cloud_node=$(kubectl get node -l node-role.kubernetes.io/master -o name | sed -e s:node/::)
edge_node=$(kubectl get node -o name | grep -v $cloud_node | sed -e s:node/::)
kubectl label node $cloud_node openyurt.io/is-edge-worker=false
kubectl label node $edge_node openyurt.io/is-edge-worker=true
kubectl annotate node $edge_node node.beta.openyurt.io/autonomy=true
deploy control-plane components on cloud node:
helm repo add openyurt https://openyurtio.github.io/openyurt-helm
# deploy yurt-app-manager first
helm upgrade --install -n kube-system yurt-app-manager openyurt/yurt-app-manager
# then yurt-controller-manager
helm upgrade --install -n kube-system --version 1.2.0 openyurt openyurt/openyurt
# check the result
helm list -A
# openyurt-1.2.0 1.2.0
# yurt-app-manager-0.1.3 0.6.0
kubectl get po -A | grep yurt
setup yurthub on edge node:
# find your kube-apiserver and token
kube_api=10.67.108.194:6443
token=0ide56.gzkntj0zwbh2qhfe
# deploy yurthub
curl -LO https://raw.githubusercontent.com/openyurtio/openyurt/release-v1.2/config/setup/yurthub.yaml
sed "s/__kubernetes_master_address__/$kube_api/;s/__bootstrap_token__/$token/" yurthub.yaml | sudo tee /etc/kubernetes/manifests/yurthub.yaml
# create kubeconfig
sudo mkdir -p /var/lib/openyurt
cat << EOF | sudo tee /var/lib/openyurt/kubelet.conf
apiVersion: v1
clusters:
- cluster:
server: http://127.0.0.1:10261
name: default-cluster
contexts:
- context:
cluster: default-cluster
namespace: default
user: default-auth
name: default-context
current-context: default-context
kind: Config
preferences: {}
EOF
# let kubelet to use the new kubeconfig
sudo sed -i.bak 's#KUBELET_KUBECONFIG_ARGS=.*"#KUBELET_KUBECONFIG_ARGS=--kubeconfig=/var/lib/openyurt/kubelet.conf"#g' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# restart kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet
# check status
sudo systemctl status kubelet
deploy KubeVirt:
VERSION=v0.58.0
kubectl create -f https://github.com/kubevirt/kubevirt/releases/download/${VERSION}/kubevirt-operator.yaml
kubectl create -f https://github.com/kubevirt/kubevirt/releases/download/${VERSION}/kubevirt-cr.yaml
# wait until kubevirt.kubevirt.io/kubevirt is deployed
kubectl get -n kubevirt kv/kubevirt -w
Deploy virtctl:
VERSION=$(kubectl get kubevirt.kubevirt.io/kubevirt -n kubevirt -o=jsonpath="{.status.observedKubeVirtVersion}")
ARCH=$(uname -s | tr A-Z a-z)-$(uname -m | sed 's/x86_64/amd64/')
curl -L -o virtctl https://github.com/kubevirt/kubevirt/releases/download/${VERSION}/virtctl-${VERSION}-${ARCH}
chmod +x virtctl
sudo install virtctl /usr/local/bin
Deploy VM for test:
kubectl apply -f https://kubevirt.io/labs/manifests/vm.yaml
kubectl get vms
# start VM
virtctl start testvm
# check status
kubectl get vmis
# access console
virtctl console testvm
@rambohe-ch @gnunu
after boot up the cloud node again, all the pods are started again on the edge node here are the files in cache
root@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache# find kubelet/ -maxdepth 2
kubelet/
kubelet/leases.v1.coordination.k8s.io
kubelet/leases.v1.coordination.k8s.io/kube-node-lease
kubelet/events.v1.core
kubelet/events.v1.core/kubevirt
kubelet/events.v1.core/default
kubelet/events.v1.core/kube-flannel
kubelet/events.v1.core/kube-system
root@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache# find yurthub/ -maxdepth 3
yurthub/
yurthub/services.v1.core
yurthub/services.v1.core/kubevirt
yurthub/services.v1.core/kubevirt/kubevirt-operator-webhook
yurthub/services.v1.core/kubevirt/kubevirt-prometheus-metrics
yurthub/services.v1.core/kubevirt/virt-exportproxy
yurthub/services.v1.core/kubevirt/virt-api
yurthub/services.v1.core/default
yurthub/services.v1.core/default/nginx
yurthub/services.v1.core/default/kubernetes
yurthub/services.v1.core/kube-system
yurthub/services.v1.core/kube-system/pool-coordinator-etcd
yurthub/services.v1.core/kube-system/pool-coordinator-apiserver
yurthub/services.v1.core/kube-system/kube-dns
yurthub/services.v1.core/kube-system/yurt-app-manager-webhook
yurthub/nodepools.v1alpha1.apps.openyurt.io
yurthub/nodepools.v1alpha1.apps.openyurt.io/master
yurthub/configmaps.v1.core
yurthub/configmaps.v1.core/kube-system
yurthub/configmaps.v1.core/kube-system/yurt-hub-cfg
@rambohe-ch @joez in this case, the master node is shutdown, I am not sure if that's considered fully in OpenYurt. When master is down, is yurthub still healthy enough for kubelet?
in this case, the master node is shutdown, I am not sure if that's considered fully in OpenYurt. When master is down, is yurthub still healthy enough for kubelet?
Yes, we expect that pods on edge can recover even master is down. @gnunu
Firstly, I've to say that openyurt+kubevirt has not been tested yet. From my perspective, yurthub provides an edge local cache for generic usage, and it can support the recovery of kubevirt theorytically. Yurthub will not have resources cache for all edge components, and in this case I think the kubevirt related resources were not cached. You may check the cache-agent configmap to see if you've enable yurthub to make cache for kubevirt.
$ kubectl get cm yurt-hub-cfg -nkube-system -oyaml
apiVersion: v1
data:
cache_agents: ""
discardcloudservice: ""
masterservice: ""
servicetopology: ""
kind: ConfigMap
metadata:
creationTimestamp: "2023-04-24T03:14:06Z"
name: yurt-hub-cfg
namespace: kube-system
resourceVersion: "842"
uid: ad2d8249-b16a-44f6-981b-c410ac93827b
cache_agents: ""
means to use default settings. To enable kubevirt cache, just simply edit it as cache_agents: "*"
, which means enable cache for all edge components.
However, it seems that the openyurt cluster was in an abnormal situtation. We expect that the cache for kubelet should contains pods, configmaps and some others which enable the pod recovery when master has shutdown. It should be like as following:
root@openyurt-e2e-test-worker:/etc/kubernetes/cache# find kubelet/ -maxdepth 2
kubelet/
kubelet/leases.v1.coordination.k8s.io
kubelet/leases.v1.coordination.k8s.io/kube-node-lease
kubelet/nodes.v1.core
kubelet/nodes.v1.core/openyurt-e2e-test-worker
kubelet/csinodes.v1.storage.k8s.io
kubelet/csinodes.v1.storage.k8s.io/openyurt-e2e-test-worker
kubelet/csidrivers.v1.storage.k8s.io
kubelet/services.v1.core
kubelet/services.v1.core/default
kubelet/services.v1.core/kube-system
kubelet/events.v1.core
kubelet/events.v1.core/default
kubelet/events.v1.core/kube-system
kubelet/runtimeclasses.v1.node.k8s.io
kubelet/pods.v1.core
kubelet/pods.v1.core/kube-system
kubelet/configmaps.v1.core
kubelet/configmaps.v1.core/kube-system
Maybe there's something wrong in yurthub. Could you check log of yurthub on worker node when master is running? It should cache theses resources from master when everything is ok. @joez
@Congrool let us check the container workload (nginx) first, and KubeVirt VM as the next step
Here is output when master node is connected:
Seems there are no pod objects cached
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ sudo find kubelet/ -maxdepth 2
kubelet/
kubelet/leases.v1.coordination.k8s.io
kubelet/leases.v1.coordination.k8s.io/kube-node-lease
kubelet/events.v1.core
kubelet/events.v1.core/kubevirt
kubelet/events.v1.core/default
kubelet/events.v1.core/kube-flannel
kubelet/events.v1.core/kube-system
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ sudo find yurthub/ -maxdepth 3
yurthub/
yurthub/services.v1.core
yurthub/services.v1.core/kubevirt
yurthub/services.v1.core/kubevirt/kubevirt-operator-webhook
yurthub/services.v1.core/kubevirt/kubevirt-prometheus-metrics
yurthub/services.v1.core/kubevirt/virt-exportproxy
yurthub/services.v1.core/kubevirt/virt-api
yurthub/services.v1.core/default
yurthub/services.v1.core/default/nginx
yurthub/services.v1.core/default/kubernetes
yurthub/services.v1.core/kube-system
yurthub/services.v1.core/kube-system/pool-coordinator-etcd
yurthub/services.v1.core/kube-system/pool-coordinator-apiserver
yurthub/services.v1.core/kube-system/kube-dns
yurthub/services.v1.core/kube-system/yurt-app-manager-webhook
yurthub/nodepools.v1alpha1.apps.openyurt.io
yurthub/nodepools.v1alpha1.apps.openyurt.io/master
yurthub/configmaps.v1.core
yurthub/configmaps.v1.core/kube-system
yurthub/configmaps.v1.core/kube-system/yurt-hub-cfg
The log of yurthub: yurthub-normal.txt And node labels and annotations:
kubernetes.io/arch=amd64
kubernetes.io/hostname=joez-hce-ub20-vm-virt-w
kubernetes.io/os=linux
kubevirt.io/schedulable=true
openyurt.io/is-edge-worker=true
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"3e:c2:32:7b:8a:0b"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.67.108.242
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
kubevirt.io/heartbeat: 2023-04-24T06:21:10Z
node.alpha.kubernetes.io/ttl: 0
node.beta.openyurt.io/autonomy: true
volumes.kubernetes.io/controller-managed-attach-detach: true
Addresses:
InternalIP: 10.67.108.242
Hostname: joez-hce-ub20-vm-virt-w
Current yurt-hub-cfg:
box@joez-hce-ub20-vm-virt-m:~$ kubectl get cm yurt-hub-cfg -nkube-system -oyaml
apiVersion: v1
data:
cache_agents: ""
discardcloudservice: ""
masterservice: ""
servicetopology: ""
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: openyurt
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2023-04-22T01:09:02Z"
labels:
app.kubernetes.io/managed-by: Helm
name: yurt-hub-cfg
namespace: kube-system
resourceVersion: "208527"
uid: 4754f00f-314b-4311-91bc-3e1778de2d95
After enabling cache for all edge components by setting cache_agents: "*"
, I can see the cache as bellows:
Most of the objects are in the
go-http-client/
folder
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ find kubelet/ go-http-client/ yurthub/ -maxdepth 2
kubelet/
kubelet/leases.v1.coordination.k8s.io
kubelet/leases.v1.coordination.k8s.io/kube-node-lease
kubelet/events.v1.core
kubelet/events.v1.core/kubevirt
kubelet/events.v1.core/default
kubelet/events.v1.core/kube-flannel
kubelet/events.v1.core/kube-system
go-http-client/
go-http-client/services.v1.core
go-http-client/services.v1.core/kubevirt
go-http-client/services.v1.core/default
go-http-client/services.v1.core/kube-system
go-http-client/leases.v1.coordination.k8s.io
go-http-client/leases.v1.coordination.k8s.io/kube-node-lease
go-http-client/csidrivers.v1.storage.k8s.io
go-http-client/csinodes.v1.storage.k8s.io
go-http-client/csinodes.v1.storage.k8s.io/joez-hce-ub20-vm-virt-w
go-http-client/pods.v1.core
go-http-client/pods.v1.core/kubevirt
go-http-client/pods.v1.core/default
go-http-client/pods.v1.core/kube-flannel
go-http-client/pods.v1.core/kube-system
go-http-client/secrets.v1.core
go-http-client/secrets.v1.core/kubevirt
go-http-client/secrets.v1.core/kube-system
go-http-client/runtimeclasses.v1.node.k8s.io
go-http-client/configmaps.v1.core
go-http-client/configmaps.v1.core/kubevirt
go-http-client/configmaps.v1.core/default
go-http-client/configmaps.v1.core/kube-flannel
go-http-client/configmaps.v1.core/kube-system
go-http-client/nodes.v1.core
go-http-client/nodes.v1.core/joez-hce-ub20-vm-virt-w
yurthub/
yurthub/services.v1.core
yurthub/services.v1.core/kubevirt
yurthub/services.v1.core/default
yurthub/services.v1.core/kube-system
yurthub/nodepools.v1alpha1.apps.openyurt.io
yurthub/nodepools.v1alpha1.apps.openyurt.io/master
yurthub/configmaps.v1.core
yurthub/configmaps.v1.core/kube-system
And then disconnect edge node from cloud node by appling iptables rules on the cloud node:
box@joez-hce-ub20-vm-virt-m:~$ kubectl get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
joez-hce-ub20-vm-virt-m Ready control-plane,master 3d15h v1.23.0 10.67.108.194 <none> Ubuntu 20.04.5 LTS 5.4.0-147-generic docker://23.0.4
joez-hce-ub20-vm-virt-w Ready <none> 3d15h v1.23.0 10.67.108.242 <none> Ubuntu 20.04.5 LTS 5.4.0-147-generic docker://23.0.4
box@joez-hce-ub20-vm-virt-m:~$ sudo iptables -I OUTPUT -d 10.67.108.242 -j DROP
box@joez-hce-ub20-vm-virt-m:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
joez-hce-ub20-vm-virt-m Ready control-plane,master 3d15h v1.23.0
joez-hce-ub20-vm-virt-w NotReady <none> 3d15h v1.23.0
After reboot the edge node, more pod are launched, but most of them exit immediately
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
74d04d4c85a1 e03484a90585 "/usr/local/bin/kube…" 9 minutes ago Up 9 minutes k8s_kube-proxy_kube-proxy-sqqck_kube-system_90fecc3f-31b1-4ba6-a825-1c0fa2db64d6_9
9c8abe396f2a registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 9 minutes ago Up 9 minutes k8s_POD_kube-proxy-sqqck_kube-system_90fecc3f-31b1-4ba6-a825-1c0fa2db64d6_9
7a34be3ac8d4 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 9 minutes ago Up 9 minutes k8s_POD_kube-flannel-ds-dkxk2_kube-flannel_37eb9f49-5338-4fb6-bd97-563d0ff098be_9
01057f3db2a8 f4fba699ab86 "yurthub --v=2 --ser…" 10 minutes ago Up 10 minutes k8s_yurt-hub_yurt-hub-joez-hce-ub20-vm-virt-w_kube-system_21482483ffe45101b48a34a036517322_9
cfc07e1e3a66 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 10 minutes ago Up 10 minutes k8s_POD_yurt-hub-joez-hce-ub20-vm-virt-w_kube-system_21482483ffe45101b48a34a036517322_9
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ docker ps -a | head
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
860d95171ea6 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 1 second ago Exited (0) Less than a second ago k8s_POD_virt-controller-695cc98c56-j4wxv_kubevirt_1021a9cd-e233-470e-8ca3-4979315c31a4_523
1a115156e86e registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 1 second ago Exited (0) Less than a second ago k8s_POD_nginx-85b98978db-hvn2z_default_b91fe8a0-253e-42e4-843f-199967d87a9d_520
079a609b6ff5 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 1 second ago Exited (0) Less than a second ago k8s_POD_virt-operator-58cb8475bb-6mswb_kubevirt_46eb2665-d9b0-4c51-9827-f87ac1ab8985_518
ce7af82dc85b registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 1 second ago Exited (0) Less than a second ago k8s_POD_virt-api-69d978dd67-rp8np_kubevirt_634e5490-d783-4fdb-ba11-3bf1558b37ae_526
1ffa1e77096f registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 2 seconds ago Exited (0) Less than a second ago k8s_POD_yurt-app-manager-b8677d956-4b9pf_kube-system_8a4afc07-7d42-4e64-b0b9-27b344eec936_521
1a5d0c203575 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 2 seconds ago Exited (0) Less than a second ago k8s_POD_virt-handler-q5sqh_kubevirt_e2d38ee6-80f3-4360-8321-cbf3b40d1985_525
5681f0ea254c registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 3 seconds ago Exited (0) 1 second ago k8s_POD_virt-api-69d978dd67-rp8np_kubevirt_634e5490-d783-4fdb-ba11-3bf1558b37ae_525
1f4dca4c6dae registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 3 seconds ago Exited (0) 1 second ago k8s_POD_yurt-app-manager-b8677d956-4b9pf_kube-system_8a4afc07-7d42-4e64-b0b9-27b344eec936_520
efe33c3130f5 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 3 seconds ago Exited (0) 1 second ago k8s_POD_nginx-85b98978db-hvn2z_default_b91fe8a0-253e-42e4-843f-199967d87a9d_519
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ docker logs 1a115156e86e
Shutting down, got signal: Terminated
Flannel is failed to start:
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ docker ps -a | grep flannel
86fc9c59c029 11ae74319a21 "/opt/bin/flanneld -…" 29 seconds ago Exited (1) 28 seconds ago k8s_kube-flannel_kube-flannel-ds-dkxk2_kube-flannel_37eb9f49-5338-4fb6-bd97-563d0ff098be_19
...
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ docker logs 86fc9c59c029
W0424 15:31:36.399679 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
E0424 15:31:36.587953 1 main.go:228] Failed to create SubnetManager: error retrieving pod spec for 'kube-flannel/kube-flannel-ds-dkxk2': Get "https://10.96.0.1:443/api/v1/namespaces/kube-flannel/pods/kube-flannel-ds-dkxk2": dial tcp 10.96.0.1:443: connect: connection refused
Can't connect to api-server via kube-proxy
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ nc -zv 10.96.0.1 443
nc: connect to 10.96.0.1 port 443 (tcp) failed: Connection refused
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ sudo iptables-save | grep -w 10.96.0.1
# OK on cloud node
box@joez-hce-ub20-vm-virt-m:~$ nc -zv 10.96.0.1 443
Connection to 10.96.0.1 443 port [tcp/https] succeeded!
box@joez-hce-ub20-vm-virt-m:~$ sudo iptables-save | grep -w 10.96.0.1
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
Check kube-proxy:
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ docker ps -a | grep proxy
74d04d4c85a1 e03484a90585 "/usr/local/bin/kube…" 41 minutes ago Up 41 minutes k8s_kube-proxy_kube-proxy-sqqck_kube-system_90fecc3f-31b1-4ba6-a825-1c0fa2db64d6_9
9
box@joez-hce-ub20-vm-virt-w:/etc/kubernetes/cache$ docker logs 74d04d4c85a1 2>&1 | less
E0424 15:08:24.338500 1 node.go:152] Failed to retrieve node info: Get "https://10.67.108.194:6443/api/v1/nodes/joez-hce-ub20-vm-virt-w": dial tcp 10.67.108.194:6443: i/o timeout
E0424 15:09:11.170905 1 node.go:152] Failed to retrieve node info: Get "https://10.67.108.194:6443/api/v1/nodes/joez-hce-ub20-vm-virt-w": dial tcp 10.67.108.194:6443: i/o timeout
I0424 15:09:11.171167 1 server.go:843] "Can't determine this node's IP, assuming 127.0.0.1; if this is incorrect, please set the --bind-address flag"
I0424 15:09:11.171257 1 server_others.go:138] "Detected node IP" address="127.0.0.1"
I0424 15:09:11.171865 1 server_others.go:561] "Unknown proxy mode, assuming iptables proxy" proxyMode=""
I0424 15:09:11.234842 1 server_others.go:206] "Using iptables Proxier"
I0424 15:09:11.234963 1 server_others.go:213] "kube-proxy running in dual-stack mode" ipFamily=IPv4
I0424 15:09:11.234992 1 server_others.go:214] "Creating dualStackProxier for iptables"
I0424 15:09:11.235069 1 server_others.go:491] "Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6"
I0424 15:09:11.237531 1 server.go:656] "Version info" version="v1.23.0"
I0424 15:09:11.243601 1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=131072
I0424 15:09:11.243792 1 conntrack.go:100] "Set sysctl" entry="net/netfilter/nf_conntrack_tcp_timeout_close_wait" value=3600
I0424 15:09:11.244956 1 config.go:317] "Starting service config controller"
I0424 15:09:11.245283 1 config.go:226] "Starting endpoint slice config controller"
I0424 15:09:11.245647 1 shared_informer.go:240] Waiting for caches to sync for service config
I0424 15:09:11.246236 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
W0424 15:09:41.247874 1 reflector.go:324] k8s.io/client-go/informers/factory.go:134: failed to list *v1.Service: Get "https://10.67.108.194:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp 10.67.108.194:6443: i/o timeout
The kube-proxy is still trying to get information from cloud node, instead of yurthub, is it the expected behavior?
Thanks for your detailed logs. I'm not sure why the component "go-http-client" list for pods and configmaps, and what it is.
I0424 06:12:11.015474 1 util.go:248] go-http-client list pods: /api/v1/pods?fieldSelector=spec.nodeName%3Djoez-hce-ub20-vm-virt-w&limit=500&resourceVersion=0 with status code 200, spent 6.076415ms
It is expected to be "kubelet list pods". I noticed that the kubernetes cluster is v1.23.0
. Maybe it's the compatibility problem between openyurt v1.2.x and kubernetes v1.23.x
I0425 03:15:19.952553 1 util.go:255] kubelet list pods: /api/v1/pods?fieldSelector=spec.nodeName%3Dopenyurt-e2e-test-worker&limit=500&resourceVersion=0 with status code 200, spent 9.565219ms
I think this can explain some of the problems we encountered.
Why does the kubelet component cache is incomplete?
Because kubelet also use another User-Agent called go-http-client
, which we do not cache by default.
Why does kube-proxy still connect to the cloud node?
This is not what we expected. kube-proxy should fetch resources through yurthub. We use filter in yurthub to do it. In normal case, we may find such log of yurthub:
I0425 03:15:19.965994 1 handler.go:79] kubeconfig in configmap(kube-system/kube-proxy) has been commented, new config.conf:
#kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
So the configmap mounted by kube-proxy should be modified by yurthub to make the kube-proxy use InClusterConfig, which will enable it fetch resource through yurthub. This configmap is fetched by kubelet, thus in yurthub we identify this configmap using the User-Agent of kubelet requests, which originally should be User-Agent: kubelet
. However the User-Agent seems to be go-http-client
in v1.23.0, and cannot be recognized by yurthub. Thus, the kube-proxy will use the unmodified kubeconfig, which directly connect to the cloud APIServer.
So in summary, these problems seems to be introduced by kubernetes v1.23.x, and I think we have to find a solution for it. Currently, to make it work around, could you please use kubernetes v1.22.x to have a try? @joez
Thanks for your explaination, I will have kubernetes 1.22.0 a try, and keep you posted later
BTW. I choose 1.23.0 because getting-started says it supports
OpenYurt supports Kubernetes versions up to 1.23. Using higher Kubernetes versions may cause compatibility issues.
With the new cluster with k8s 1.22.0, the cache is as expected now.
box@joez-hce-ub20-vm-oykv-w:/etc/kubernetes/cache$ ls
_apis_discovery.k8s.io_v1 _apis_discovery.k8s.io_v1beta1 _internal flanneld kubelet version yurthub
box@joez-hce-ub20-vm-oykv-w:/etc/kubernetes/cache$ find kubelet/ -maxdepth 2
kubelet/
kubelet/services.v1.core
kubelet/services.v1.core/kubevirt
kubelet/services.v1.core/default
kubelet/services.v1.core/kube-system
kubelet/leases.v1.coordination.k8s.io
kubelet/leases.v1.coordination.k8s.io/kube-node-lease
kubelet/csidrivers.v1.storage.k8s.io
kubelet/csinodes.v1.storage.k8s.io
kubelet/csinodes.v1.storage.k8s.io/joez-hce-ub20-vm-oykv-w
kubelet/pods.v1.core
kubelet/pods.v1.core/kubevirt
kubelet/pods.v1.core/default
kubelet/pods.v1.core/kube-flannel
kubelet/pods.v1.core/kube-system
kubelet/secrets.v1.core
kubelet/secrets.v1.core/kubevirt
kubelet/secrets.v1.core/kube-system
kubelet/runtimeclasses.v1.node.k8s.io
kubelet/configmaps.v1.core
kubelet/configmaps.v1.core/kubevirt
kubelet/configmaps.v1.core/default
kubelet/configmaps.v1.core/kube-flannel
kubelet/configmaps.v1.core/kube-system
kubelet/events.v1.core
kubelet/events.v1.core/kubevirt
kubelet/events.v1.core/default
kubelet/events.v1.core/kube-flannel
kubelet/events.v1.core/kube-system
kubelet/nodes.v1.core
kubelet/nodes.v1.core/joez-hce-ub20-vm-oykv-w
But kube-proxy is still trying to connect to kube-apiserver instead of yurthub So I try to deploy the yurthub on the cloud node too, now I can see the configuration of the kube-proxy is changed:
I0425 07:28:48.980817 1 filter.go:92] kubeconfig in configmap(kube-system/kube-proxy) has been commented, new config.conf:
apiVersion: kubeproxy.config.k8s.io/v1alpha1
...
#kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: 10.244.0.0/16
And kube-proxy as well as flannel, nginx can be launched
box@joez-hce-ub20-vm-oykv-w:~$ docker ps | grep nginx | grep -v POD | awk '{print $1}'
13887b771982
box@joez-hce-ub20-vm-oykv-w:~$ docker exec 13887b771982 cat /proc/net/fib_trie | awk '/32 host/ { print i } {i=$2}' | grep -v 127.0 | uniq
10.244.1.37
box@joez-hce-ub20-vm-oykv-w:~$ no_proxy='*' curl -s 10.244.1.37:80 | grep Welcome
<title>Welcome to nginx!</title>
But there are still two problems:
- KubeVirt VM is not started
- Kube-proxy still has some issues, can't get the service information, so doesn't setup iptables rules correctly
# no VM is running
box@joez-hce-ub20-vm-oykv-w:~$ ps -ef | grep qemu | grep -v grep
box@joez-hce-ub20-vm-oykv-w:~$ docker ps | grep virt-handler | grep -v POD | awk '{print $1}'
90da7040e86a
box@joez-hce-ub20-vm-oykv-w:~$ docker logs 90da7040e86a 2>&1 | less
W0425 14:03:54.447851 8650 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
{"Unable to mark node as unschedulable":"can not cache for go-http-client patch nodes: /api/v1/nodes/joez-hce-ub20-vm-oykv-w","component":"virt-handler","level":"error","pos":"virt-handler.go:179","timestamp":"2023-04-25T14:03:54.503437Z"}
{"component":"virt-handler","level":"info","msg":"set verbosity to 2","pos":"virt-handler.go:471","timestamp":"2023-04-25T14:03:54.505689Z"}
...
# kube-proxy can't get service objects
box@joez-hce-ub20-vm-oykv-w:~$ docker ps | grep kube-proxy | grep -v POD | awk '{print $1}'
440188760b30
box@joez-hce-ub20-vm-oykv-w:~$ docker logs 440188760b30 2>&1 | less
I0425 14:03:47.152925 1 server.go:553] Neither kubeconfig file nor master URL was specified. Falling back to in-cluster config.
E0425 14:04:34.012669 1 node.go:161] Failed to retrieve node info: Get "https://169.254.2.1:10268/api/v1/nodes/joez-hce-ub20-vm-oykv-w": Service Unavailable
I0425 14:04:34.012744 1 server.go:836] can't determine this node's IP, assuming 127.0.0.1; if this is incorrect, please set the --bind-address flag
I0425 14:04:34.013173 1 server_others.go:140] Detected node IP 127.0.0.1
W0425 14:04:34.013292 1 server_others.go:565] Unknown proxy mode "", assuming iptables proxy
I0425 14:04:34.071984 1 server_others.go:206] kube-proxy running in dual-stack mode, IPv4-primary
I0425 14:04:34.072103 1 server_others.go:212] Using iptables Proxier.
I0425 14:04:34.072131 1 server_others.go:219] creating dualStackProxier for iptables.
W0425 14:04:34.072193 1 server_others.go:495] detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6
I0425 14:04:34.073932 1 server.go:649] Version: v1.22.0
I0425 14:04:34.078625 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0425 14:04:34.078684 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0425 14:04:34.079048 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0425 14:04:34.079245 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0425 14:04:34.080453 1 config.go:315] Starting service config controller
I0425 14:04:34.080490 1 shared_informer.go:240] Waiting for caches to sync for service config
I0425 14:04:34.080534 1 config.go:224] Starting endpoint slice config controller
I0425 14:04:34.080542 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
E0425 14:04:37.012366 1 event_broadcaster.go:262] Unable to write event: 'Post "https://169.254.2.1:10268/apis/events.k8s.io/v1/namespaces/default/events": Service Unavailable' (may retry after sleeping)
E0425 14:04:37.012666 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://169.254.2.1:10268/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": Service Unavailable
...
E0425 15:00:28.395624 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://169.254.2.1:10268/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": Service Unavailable
box@joez-hce-ub20-vm-oykv-w:~$ ip a
...
4: yurthub-dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 56:f9:05:f9:60:8b brd ff:ff:ff:ff:ff:ff
inet 169.254.2.1/32 scope global yurthub-dummy0
valid_lft forever preferred_lft forever
box@joez-hce-ub20-vm-oykv-w:~$ sudo ss -lntp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 169.254.2.1:10261 0.0.0.0:* users:(("yurthub",pid=1600,fd=9))
LISTEN 0 4096 127.0.0.1:10261 0.0.0.0:* users:(("yurthub",pid=1600,fd=8))
LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=690,fd=13))
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1746,fd=3))
LISTEN 0 4096 127.0.0.1:10267 0.0.0.0:* users:(("yurthub",pid=1600,fd=7))
LISTEN 0 4096 169.254.2.1:10268 0.0.0.0:* users:(("yurthub",pid=1600,fd=10))
LISTEN 0 4096 127.0.0.1:10248 0.0.0.0:* users:(("kubelet",pid=718,fd=18))
LISTEN 0 4096 127.0.0.1:10249 0.0.0.0:* users:(("kube-proxy",pid=2686,fd=18))
LISTEN 0 4096 127.0.0.1:34505 0.0.0.0:* users:(("kubelet",pid=718,fd=12))
LISTEN 0 4096 *:10256 *:* users:(("kube-proxy",pid=2686,fd=19))
LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=1746,fd=4))
LISTEN 0 4096 *:10250 *:* users:(("kubelet",pid=718,fd=35))
@rambohe-ch @Congrool would you help to check the kube-proxy issue? This issue prevent service to service communication from working
As I mentioned last time, the kube-proxy does not work as expected:
box@joez-hce-ub20-vm-oykv-w:~$ sudo iptables -t nat -n -L KUBE-SERVICES
iptables: No chain/target/match by that name.
It should have setup iptables rules in normal case, like following:
box@joez-hce-ub20-vm-virt-w:~$ sudo iptables -t nat -n -L KUBE-SERVICES
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-SVC-OVTWZ4GROBJZO4C5 tcp -- 0.0.0.0/0 10.96.165.12 /* default/nginx:80-80 cluster IP */ tcp dpt:80
KUBE-SVC-EIEVNBW5YXUIDXZD tcp -- 0.0.0.0/0 10.96.186.205 /* kubevirt/kubevirt-prometheus-metrics:metrics cluster IP */ tcp dpt:443
KUBE-SVC-JD5MR3NA4I4DYORP tcp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
KUBE-SVC-LON7267IY6XCAPHT tcp -- 0.0.0.0/0 10.96.62.36 /* kube-system/yurt-app-manager-webhook:https cluster IP */ tcp dpt:443
KUBE-SVC-GXXJIUUZRDUOXB4K tcp -- 0.0.0.0/0 10.96.28.38 /* kubevirt/kubevirt-operator-webhook:webhooks cluster IP */ tcp dpt:443
KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- 0.0.0.0/0 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-SVC-UDPDOKU2AFJKWYNL tcp -- 0.0.0.0/0 10.96.123.232 /* kubevirt/virt-api cluster IP */ tcp dpt:443
KUBE-SVC-TCOU7JCQXEZGVUNU udp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
KUBE-NODEPORTS all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
So, maybe something wrong which cause kube-proxy can't get enough information from yurt-hub to setup iptables rules.
@joez Hey, sorry for late reply. It seems that the kube-proxy cannot access yurthub server. We may need to check if yurthub server still works.
You can use following cmd on your host joez-hce-ub20-vm-virt-w
when worker is disconnected from the master.
curl -H "User-Agent: kube-proxy" http://127.0.0.1:10261/api/v1/nodes/joez-hce-ub20-vm-virt-w
In normal case, yurthub will use the node cache of kube-proxy component under /etc/kubernetes/cache/kube-proxy/nodes.v1.core
to response the request, and you can get json output of such node. You can also check if there is such cache at the path.
And, could you post your kube-proxy version? BTW, in my cluster, the kube-proxy is v1.22.7.
I have two k8s cluster currently, joez-hce-ub20-vm-virt-{m,w} is v1.23.0 and joez-hce-ub20-vm-oykv-{m,w} is v1.22.0, let us focus on the later one
The kube-proxy version is v1.22.0
I0425 06:34:34.107118 1 server.go:649] Version: v1.22.0
There is no such cache
root@joez-hce-ub20-vm-oykv-w:/home/box# ls /etc/kubernetes/cache/kube-proxy/nodes.v1.core
ls: cannot access '/etc/kubernetes/cache/kube-proxy/nodes.v1.core': No such file or directory
root@joez-hce-ub20-vm-oykv-w:/home/box# ls /etc/kubernetes/cache/
_apis_discovery.k8s.io_v1 _apis_discovery.k8s.io_v1beta1 flanneld go-http-client _internal kubelet version virt-api virt-controller yurt-app-manager yurthub
Access to the port 10261 is OK by cURL:
box@joez-hce-ub20-vm-oykv-w:~$ no_proxy='*' curl -H "User-Agent: kube-proxy" -o /dev/null -s -w '%{http_code}\n' http://127.0.0.1:10261/api/v1/nodes/joez-hce-ub20-vm-oykv-w
200
Accessing 127.0.0.1:10261 is the same as 169.254.2.1:10268? I see the error in the logs:
E0425 14:04:34.012669 1 node.go:161] Failed to retrieve node info: Get "https://169.254.2.1:10268/api/v1/nodes/joez-hce-ub20-vm-oykv-w": Service Unavailable
The no_proxy variable in the kube-proxy container does not cover 169.254.2.1, maybe I need to add Automatic Private IP Addressing (APIPA) range into it.
box@joez-hce-ub20-vm-oykv-w:~$ docker exec 4bcb029c86f7 env | grep no_proxy
no_proxy=.svc,.svc.cluster.local,10.244.0.0/16,10.96.0.0/16,localhost,joez-hce-ub20-vm-openyurt-m,sh.intel.com,istio-system.svc,127.0.0.0/8,172.16.0.0/12,192.168.0.0/16,10.0.0.0/8
There is no such cache
Well, It's strange that there's no cache for kube-proxy. It should be there to make kube-proxy work when worker offline, like:
# ls /etc/kubernetes/cache/
_apis_discovery.k8s.io_v1 _internal coredns kube-proxy kubelet version yurthub
# ls /etc/kubernetes/cache/kube-proxy/
endpointslices.v1.discovery.k8s.io events.v1.events.k8s.io nodes.v1.core services.v1.core
You can make the worker connect to the master, and then restart the kube-proxy on worker node at which time yurthub will cache the response from master. Could you have a try? After cache being created, kube-proxy can restart and work even worker is offline.
Accessing 127.0.0.1:10261 is the same as 169.254.2.1:10268?
Yes, actually yurthub server listens on both addresses with same handler.
Access to the port 10261 is OK by cURL
It should not get only the status code, but also the json data of node resource, like:
# curl -H "User-Agent: kube-proxy" http://127.0.0.1:10261/api/v1/nodes/openyurt-e2e-test-worker
{"kind":"Node","apiVersion":"v1","metadata":{"name":"openyurt-e2e-test-worker","uid":"fb53f206-0ba0-44c5-a0eb-253d953a925b","resourceVersion":"903","creationTimestamp":"2023-04-25T03:13:31Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"openyurt-e2e-test-worker","kubernetes.io/os":"linux","openyurt.io/is-edge-worker":"true"},"annotations":{"kubeadm.alpha.kubernetes.io/cri-socket":"unix:///run/containerd/containerd.sock","node.alpha.kubernetes.io/ttl":"0","node.beta.openyurt.io/autonomy":"false","volumes.kubernetes.io/controller-managed-attach-detach":"true"},"managedFields":[{"manager":"kubelet","operation":"Update","apiVersion":"v1","time":"2023-04-25T03:13:31Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:volumes.kubernetes.io/controller-managed-attach-detach":{}},"f:labels":{".":{},"f:beta.kubernetes.io/arch":{},"f:beta.kubernetes.io/os":{},"f:kubernetes.io/arch":{},"f:kubernetes.io/hostname":{},"f:kubernetes.io/os":{}}},"f:spec":{"f:providerID":{}}}},{"manager":"kubeadm","operation":"Update","apiVersion":"v1","time":"2023-04-25T03:13:32Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{"f:kubeadm.alpha.kubernetes.io/cri-socket":{}}}}},{"manager":"kubelet","operation":"Update","apiVersion":"v1","time":"2023-04-25T03:14:31Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:conditions":{"k:{\"type\":\"DiskPressure\"}":{"f:lastHeartbeatTime":{}},"k:{\"type\":\"MemoryPressure\"}":{"f:lastHeartbeatTime":{}},"k:{\"type\":\"PIDPressure\"}":{"f:lastHeartbeatTime":{}},"k:{\"type\":\"Ready\"}":{"f:lastHeartbeatTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{}}},"f:images":{}}},"subresource":"status"},{"manager":"yurtctl","operation":"Update","apiVersion":"v1","time":"2023-04-25T03:14:57Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{"f:node.beta.openyurt.io/autonomy":{}},"f:labels":{"f:openyurt.io/is-edge-worker":{}}}}},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"v1","time":"2023-04-25T03:15:19Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{"f:node.alpha.kubernetes.io/ttl":{}}},"f:spec":{"f:podCIDR":{},"f:podCIDRs":{".":{},"v:\"10.244.1.0/24\"":{}}}}}]},"spec":{"podCIDR":"10.244.1.0/24","podCIDRs":["10.244.1.0/24"],"providerID":"kind://docker/openyurt-e2e-test/openyurt-e2e-test-worker"},"status":{"capacity":{"cpu":"8","ephemeral-storage":"102350Mi","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"40971612Ki","pods":"110"},"allocatable":{"cpu":"8","ephemeral-storage":"102350Mi","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"40971612Ki","pods":"110"},"conditions":[{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2023-04-25T03:15:19Z","lastTransitionTime":"2023-04-25T03:13:31Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2023-04-25T03:15:19Z","lastTransitionTime":"2023-04-25T03:13:31Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"PIDPressure","status":"False","lastHeartbeatTime":"2023-04-25T03:15:19Z","lastTransitionTime":"2023-04-25T03:13:31Z","reason":"KubeletHasSufficientPID","message":"kubelet has sufficient PID available"},{"type":"Ready","status":"True","lastHeartbeatTime":"2023-04-25T03:15:19Z","lastTransitionTime":"2023-04-25T03:15:19Z","reason":"KubeletReady","message":"kubelet is posting ready status"}],"addresses":[{"type":"InternalIP","address":"172.19.0.2"},{"type":"Hostname","address":"openyurt-e2e-test-worker"}],"daemonEndpoints":{"kubeletEndpoint":{"Port":10250}},"nodeInfo":{"machineID":"374ad63edf4d4470a07e6974619f9364","systemUUID":"6b87df04-568e-400a-82fe-8e6b79a81dcc","bootID":"6cfe89bd-e735-4b1c-90f7-3c683e412759","kernelVersion":"5.4.0-146-generic","osImage":"Ubuntu 21.10","containerRuntimeVersion":"containerd://1.5.10","kubeletVersion":"v1.22.7","kubeProxyVersion":"v1.22.7","operatingSystem":"linux","architecture":"amd64"},"images":[{"names":["k8s.gcr.io/kube-proxy:v1.22.7"],"sizeBytes":105458887},{"names":["k8s.gcr.io/etcd:3.5.0-0"],"sizeBytes":99868722},{"names":["k8s.gcr.io/kube-apiserver:v1.22.7"],"sizeBytes":74670034},{"names":["k8s.gcr.io/kube-controller-manager:v1.22.7"],"sizeBytes":67522360},{"names":["docker.io/openyurt/yurthub:v1.2.1"],"sizeBytes":57765800},{"names":["k8s.gcr.io/kube-scheduler:v1.22.7"],"sizeBytes":53923640},{"names":["docker.io/openyurt/yurt-tunnel-agent:v1.2.1"],"sizeBytes":44572610},{"names":["docker.io/kindest/kindnetd:v20211122-a2c10462"],"sizeBytes":40928505},{"names":["docker.io/openyurt/node-servant:v1.2.1"],"sizeBytes":38556748},{"names":["k8s.gcr.io/build-image/debian-base:buster-v1.7.2"],"sizeBytes":21133992},{"names":["k8s.gcr.io/coredns/coredns:v1.8.4"],"sizeBytes":13707249},{"names":["docker.io/rancher/local-path-provisioner:v0.0.14"],"sizeBytes":13367922},{"names":["k8s.gcr.io/pause:3.6"],"sizeBytes":301773}]}}
The kube-proxy issue is because of the wrong no_proxy setting, add the APIPA range into it, kube-proxy works fine now:
box@joez-hce-ub20-vm-oykv-w:~$ docker exec 82963d60adbd env | grep no_proxy
no_proxy=.svc,.svc.cluster.local,10.244.0.0/16,10.96.0.0/16,localhost,joez-hce-ub20-vm-openyurt-m,sh.intel.com,istio-system.svc,127.0.0.0/8,169.254.0.0/16,172.16.0.0/12,192.168.0.0/16,10.0.0.0/8
box@joez-hce-ub20-vm-oykv-w:~$ docker logs 82963d60adbd
I0505 07:22:06.897972 1 server.go:553] Neither kubeconfig file nor master URL was specified. Falling back to in-cluster config.
I0505 07:22:06.919089 1 node.go:172] Successfully retrieved node IP: 10.67.109.173
I0505 07:22:06.919132 1 server_others.go:140] Detected node IP 10.67.109.173
W0505 07:22:06.919172 1 server_others.go:565] Unknown proxy mode "", assuming iptables proxy
I0505 07:22:06.964667 1 server_others.go:206] kube-proxy running in dual-stack mode, IPv4-primary
I0505 07:22:06.964704 1 server_others.go:212] Using iptables Proxier.
I0505 07:22:06.964715 1 server_others.go:219] creating dualStackProxier for iptables.
W0505 07:22:06.964730 1 server_others.go:495] detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6
I0505 07:22:06.965123 1 server.go:649] Version: v1.22.0
I0505 07:22:06.969845 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0505 07:22:06.969873 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0505 07:22:06.969957 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0505 07:22:06.969989 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0505 07:22:06.970561 1 config.go:315] Starting service config controller
I0505 07:22:06.970704 1 config.go:224] Starting endpoint slice config controller
I0505 07:22:06.970710 1 shared_informer.go:240] Waiting for caches to sync for service config
I0505 07:22:06.970717 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0505 07:22:07.070911 1 shared_informer.go:247] Caches are synced for service config
I0505 07:22:07.071021 1 shared_informer.go:247] Caches are synced for endpoint slice config
box@joez-hce-ub20-vm-oykv-w:~$ sudo iptables -t nat -n -L KUBE-SERVICES
[sudo] password for box:
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-SVC-OVTWZ4GROBJZO4C5 tcp -- 0.0.0.0/0 10.96.193.48 /* default/nginx:80-80 cluster IP */ tcp dpt:80
KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
KUBE-SVC-LON7267IY6XCAPHT tcp -- 0.0.0.0/0 10.96.135.118 /* kube-system/yurt-app-manager-webhook:https cluster IP */ tcp dpt:443
KUBE-SVC-UDPDOKU2AFJKWYNL tcp -- 0.0.0.0/0 10.96.221.157 /* kubevirt/virt-api cluster IP */ tcp dpt:443
KUBE-SVC-EIEVNBW5YXUIDXZD tcp -- 0.0.0.0/0 10.96.197.127 /* kubevirt/kubevirt-prometheus-metrics:metrics cluster IP */ tcp dpt:443
KUBE-SVC-JD5MR3NA4I4DYORP tcp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
KUBE-SVC-TCOU7JCQXEZGVUNU udp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
KUBE-SVC-GXXJIUUZRDUOXB4K tcp -- 0.0.0.0/0 10.96.49.55 /* kubevirt/kubevirt-operator-webhook:webhooks cluster IP */ tcp dpt:443
KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- 0.0.0.0/0 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-NODEPORTS all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
box@joez-hce-ub20-vm-oykv-w:~$ ls /etc/kubernetes/cache/kube-proxy/
endpointslices.v1.discovery.k8s.io events.v1.events.k8s.io nodes.v1.core services.v1.core
It's time to check virt-handler now, I think it is still trying to talk to kube-apiserver
...
W0505 07:22:12.698969 6593 reflector.go:324] pkg/controller/virtinformers.go:331: failed to list *v1.VirtualMachineInstance: can not cache for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FmigrationTargetNodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0
E0505 07:22:12.699042 6593 reflector.go:138] pkg/controller/virtinformers.go:331: Failed to watch *v1.VirtualMachineInstance: failed to list *v1.VirtualMachineInstance: can not cache for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FmigrationTargetNodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0
{"component":"virt-handler","level":"info","msg":"failed to dial cmd socket: //pods/95f6251c-07e3-4987-a14a-ae0af2e0b43a/volumes/kubernetes.io~empty-dir/sockets/launcher-sock","pos":"client.go:303","reason":"context deadline exceeded","timestamp":"2023-05-05T07:22:13.544301Z"}
{"component":"virt-handler","level":"error","msg":"failed to connect to cmd client socket","pos":"cache.go:526","reason":"context deadline exceeded","timestamp":"2023-05-05T07:22:13.544584Z"}
To be honest, I'm not familiar with kubevirt. I can only check the situation from the view of yurthub. I'm not sure how many kinds of kubevirt related components will run on worker nodes. I saw that the cache has already contained something like virt-api
, virt-controller
. Does virt-handler use one of them? I mean we may need to check what's the User-Agent of virt-handler when it sends requests.
And another question, does virt-handler have its kubeconfig? If so, I think we can remove such kubeconfig to make it use InClusterConfig, which will enable virt-handler to send requests to yurthub instead of apiserver.
@Congrool Thank you very much. Maybe I am the first one to use KubeVirt on OpenYurt, but I think more and more users will choose KubeVirt if they want to orchestrate VM workload (such as app on Windows), and OpenYurt if they require edge autonomy.
I will check KubeVirt further, the solution should be similar as kube-proxy, we need to customize it, as you mentioned to use InClusterConfig. But I don't understand how kube-proxy work with yurt-hub, would you show me some document in details?
I don't understand how kube-proxy work with yurt-hub, would you show me some document in details?
@joez I can give you some details of it. You can check the doc of yurthub which gives a rough description of the Data Filtering Framework
. But it contains more filters than what we post here. The two main filters that have effects on kube-proxy is MasterService Filter
and InClusterConfig Filter
.
The previous one mainly affects the kubelet when it creates pods and sets envs in it. To be specific, it will change clusterIP and port of the kuberentes service that kubelet got from kube-apiserver. You can verify it by
cat /etc/kubernetes/cache/kubelet/services.v1.core/default/kubernetes
whose spec.ClusterIP
has been changed to 169.254.2.1 and spec.Ports
has been changed to 10268. Then when kubelet creates pods, it will set envs for pods like KUBERNETES_SERVICE_PORT=10268
, KUBERNETES_SERVICE_HOST=169.254.2.1
which components using InClusterConfig will send request to. As a result, yurthub will serve these requests. You can check these envs within pods.
Based on the MasterService Filter, what we need to do is that making sure all components use InClusterConfig, while the kube-proxy will use configmap kube-proxy
as it kubeconfig in default settings. So we need InClusterConfig Filter
which takes the responsibility of removing kubeconfig.conf
from kube-proxy
configmap. Thus, kubelet will only get the modified kube-proxy
configmap and then mount it for kube-proxy pod enabling it to use InClusterConfig.
From the error message of virt-handler
:
E0506 05:52:02.097983 6593 reflector.go:138] pkg/controller/virtinformers.go:331: Failed to watch *v1.VirtualMachineInstance: failed to list *v1.VirtualMachineInstance: can not cache for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FmigrationTargetNodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0
The user agent is go-http-client
, I can find the cached object:
root@joez-hce-ub20-vm-oykv-w:/etc/kubernetes/cache# ls go-http-client/virtualmachines.v1alpha3.kubevirt.io/default
testvm
The kubernetes
service has already set to yurt-hub
:
root@joez-hce-ub20-vm-oykv-w:/etc/kubernetes/cache# docker exec c487b5a6a2e7 env | grep KUBERNETES_SERVICE | sort
KUBERNETES_SERVICE_HOST=169.254.2.1
KUBERNETES_SERVICE_PORT=10268
KUBERNETES_SERVICE_PORT_HTTPS=10268
Don't know why it is still failed to get the virtualmachineinstances
object
Try to get it via yurt-hub
directly, the same error:
root@joez-hce-ub20-vm-oykv-w:/etc/kubernetes/cache# no_proxy='*' curl -H "User-Agent: go-http-client" -v -L 'http://127.0.0.1:10261/apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FmigrationTargetNodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0'
* Uses proxy env variable no_proxy == '*'
* Trying 127.0.0.1:10261...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 10261 (#0)
> GET /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FmigrationTargetNodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0 HTTP/1.1
> Host: 127.0.0.1:10261
> Accept: */*
> User-Agent: go-http-client
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Content-Type: application/json
< Date: Sat, 06 May 2023 10:11:46 GMT
< Content-Length: 351
<
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"can not cache for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FmigrationTargetNodeName+in+%28joez-hce-ub20-vm-oykv-w%29\u0026limit=500\u0026resourceVersion=0","reason":"BadRequest","code":400}
* Connection #0 to host 127.0.0.1 left intact
The error is from pkg/yurthub/proxy/local/local.go
:
// localReqCache handles Get/List/Update requests when remote servers are unhealthy
func (lp *LocalProxy) localReqCache(w http.ResponseWriter, req *http.Request) error {
if !lp.cacheMgr.CanCacheFor(req) {
klog.Errorf("can not cache for %s", hubutil.ReqString(req))
return apierrors.NewBadRequest(fmt.Sprintf("can not cache for %s", hubutil.ReqString(req)))
}
@Congrool Would you shed some light on why the request can not be cached?
I think I am almost approaching our target, except the "can not cache" error from yurthub
as mentioned last time:
@rambohe-ch @Congrool Would you help on this? I don't know why these resources can't be cached
box@joez-hce-ub20-vm-oykv-w:~$ docker logs fb5695ee3d8b
...
{"component":"virt-handler","level":"info","msg":"STARTING informer vmiInformer-targets","pos":"virtinformers.go:330","timestamp":"2023-05-10T04:25:54.720361Z"}
W0510 04:25:54.751331 7102 reflector.go:324] pkg/controller/virtinformers.go:331: failed to list *v1.VirtualMachineInstance: can not cache for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FnodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0
E0510 04:25:54.751694 7102 reflector.go:138] pkg/controller/virtinformers.go:331: Failed to watch *v1.VirtualMachineInstance: failed to list *v1.VirtualMachineInstance: can not cache for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FnodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0
Here are the related logs from yurthub
:
I0510 04:25:54.719455 1 util.go:289] start proxying: get /apis/apiextensions.k8s.io/v1/customresourcedefinitions?limit=500&resourceVersion=0, in flight requests: 26
I0510 04:25:54.726858 1 util.go:289] start proxying: get /api/v1/namespaces/kubevirt/configmaps?fieldSelector=metadata.name%3Dkubevirt-ca&limit=500&resourceVersion=0, in flight requests: 27
I0510 04:25:54.727759 1 util.go:248] go-http-client list configmaps: /api/v1/namespaces/kubevirt/configmaps?fieldSelector=metadata.name%3Dkubevirt-ca&limit=500&resourceVersion=0 with status code 200, spent 724.906µs
I0510 04:25:54.729074 1 util.go:289] start proxying: get /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FnodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0, in flight requests: 27
I0510 04:25:54.729183 1 util.go:289] start proxying: get /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FmigrationTargetNodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0, in flight requests: 28
W0510 04:25:54.729291 1 cache_manager.go:769] list requests that have the same path but with different selector, skip cache for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FnodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0
E0510 04:25:54.729347 1 local.go:217] can not cache for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FnodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0
E0510 04:25:54.729383 1 local.go:87] could not proxy local for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FnodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0, can not cache for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FnodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0
I0510 04:25:54.729564 1 util.go:248] go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FnodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0 with status code 400, spent 295.259µs
What I have done are:
- Deploy CoreDNS as DaemonSet on each node, so that DNS resolution still work when disconnected from cloud node
- Apply fixed IP address assignment feature on
host-local
IPAM, so that the service can find the correct endpoints when disconnected
box@joez-hce-ub20-vm-oykv-w:~$ docker exec -it 2deb25f5272a sh
/ # nslookup nginx
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: nginx
Address 1: 10.96.2.116 nginx.default.svc.cluster.local
/ # nslookup virt-api.kubevirt
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: virt-api.kubevirt
Address 1: 10.96.206.147 virt-api.kubevirt.svc.cluster.local
/ # ping virt-api.kubevirt
PING virt-api.kubevirt (10.96.206.147): 56 data bytes
64 bytes from 10.96.206.147: seq=0 ttl=241 time=184.707 ms
64 bytes from 10.96.206.147: seq=1 ttl=241 time=202.301 ms
^C
--- virt-api.kubevirt ping statistics ---
3 packets transmitted, 2 packets received, 33% packet loss
round-trip min/avg/max = 184.707/193.504/202.301 ms
/ # exit
box@joez-hce-ub20-vm-oykv-w:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fb5695ee3d8b c407633b131b "virt-handler --port…" 3 minutes ago Up 3 minutes k8s_virt-handler_virt-handler-7q56b_kubevirt_60b59abb-295d-455f-b8a1-a248a5656d7a_8
38068327ac79 alpine "/bin/sh" 3 minutes ago Up 3 minutes k8s_test_test_default_962446d6-f99d-41d8-b650-6af03f8a007f_10
d6a7f0c799bf nginx "/docker-entrypoint.…" 3 minutes ago Up 3 minutes k8s_nginx_nginx-6799fc88d8-j7fwc_default_13d7e957-bc7d-4bfc-8db2-507c70fd240f_14
6e78501b808f k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_virt-operator-55989d567c-p5nkl_kubevirt_d407b21d-1eb7-47ea-8bd0-6daec1bbe747_50
436b91002bab k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_virt-handler-7q56b_kubevirt_60b59abb-295d-455f-b8a1-a248a5656d7a_47
0dc549a31e7a 943b496a674d "virt-api --port 844…" 3 minutes ago Up 3 minutes k8s_virt-api_virt-api-5474cf649d-rlcrw_kubevirt_0218bfee-c273-4ed5-8626-43a88d0f5267_8
13a093ee9642 943b496a674d "virt-api --port 844…" 3 minutes ago Up 3 minutes k8s_virt-api_virt-api-5474cf649d-xp658_kubevirt_f0805cfd-2132-4515-acbc-68c967ab2b22_8
2deb25f5272a 8c811b4aec35 "sleep 3600" 3 minutes ago Up 3 minutes k8s_debug_debug_default_8051f632-a8bf-4ca8-801a-4aeca8bcb824_4
d77d27497658 8d147537fb7d "/coredns -conf /etc…" 3 minutes ago Up 3 minutes k8s_coredns_coredns-klpms_kube-system_9c4821ad-f91d-4174-af9a-dfecdbe2321e_5
71ae8a256f11 k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_virt-operator-55989d567c-t2n76_kubevirt_dc3b91e2-3751-48af-ac7c-2e5d060b0349_46
1cb7fbce79cf k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_virt-api-5474cf649d-xp658_kubevirt_f0805cfd-2132-4515-acbc-68c967ab2b22_45
c6c7d9405810 k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_virt-api-5474cf649d-rlcrw_kubevirt_0218bfee-c273-4ed5-8626-43a88d0f5267_46
0fae5f38ff65 k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_virt-controller-7f8ff6cdc4-wcvvb_kubevirt_403a3601-b8e7-4df4-88e1-f93a6a94939c_47
ebd626f49650 k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_debug_default_8051f632-a8bf-4ca8-801a-4aeca8bcb824_25
bf519a40f8ef k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_test_default_962446d6-f99d-41d8-b650-6af03f8a007f_77
ee5297767dd0 k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_coredns-klpms_kube-system_9c4821ad-f91d-4174-af9a-dfecdbe2321e_37
57e2183d1c56 k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_virt-controller-7f8ff6cdc4-hd4ft_kubevirt_1946d9c4-8aa8-498d-b6c0-7fa0812c2da9_51
dddc0e3fdba2 k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_nginx-6799fc88d8-j7fwc_default_13d7e957-bc7d-4bfc-8db2-507c70fd240f_92
bdd0259f6407 k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_yurt-app-manager-6fd8dcd6b4-9gp6n_kube-system_a117d8a4-da40-4568-ba1a-61b1979a76ed_98
83e7610f3578 11ae74319a21 "/opt/bin/flanneld -…" 3 minutes ago Up 3 minutes k8s_kube-flannel_kube-flannel-ds-9mrs8_kube-flannel_2459bd62-295b-4806-a751-ad70a2660c29_16
31ebde671e70 bbad1636b30d "/usr/local/bin/kube…" 3 minutes ago Up 3 minutes k8s_kube-proxy_kube-proxy-9ktvr_kube-system_9404c203-bca0-4598-9aec-6f371e699df4_15
129fcf09327e k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-proxy-9ktvr_kube-system_9404c203-bca0-4598-9aec-6f371e699df4_15
7daf002b5da3 k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-flannel-ds-9mrs8_kube-flannel_2459bd62-295b-4806-a751-ad70a2660c29_15
10f0ab49723e 60fb0e90cdfb "yurthub --v=2 --ser…" 3 minutes ago Up 3 minutes k8s_yurt-hub_yurt-hub-joez-hce-ub20-vm-oykv-w_kube-system_dd10f5ec226508a076ff4cffac748add_15
65f925802c20 k8s.gcr.io/pause:3.5 "/pause" 3 minutes ago Up 3 minutes k8s_POD_yurt-hub-joez-hce-ub20-vm-oykv-w_kube-system_dd10f5ec226508a076ff4cffac748add_15
I saw that in yurthub logs
I0510 04:25:54.729074 1 util.go:289] start proxying: get /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FnodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0, in flight requests: 27
I0510 04:25:54.729183 1 util.go:289] start proxying: get /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FmigrationTargetNodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0, in flight requests: 28
W0510 04:25:54.729291 1 cache_manager.go:769] list requests that have the same path but with different selector, skip cache for go-http-client list virtualmachineinstances: /apis/kubevirt.io/v1alpha3/virtualmachineinstances?labelSelector=kubevirt.io%2FnodeName+in+%28joez-hce-ub20-vm-oykv-w%29&limit=500&resourceVersion=0
Yurthub got the warning because it can only cache list/watch requests from same component for same resource with different selector. In this case, when go-http-client
has list/watch virtualmachineinstances
with selector A, yurthub will cache all virtualmachineinstances
that match the selector A. Then, when go-http-client
wants to list/watch virtualmachineinstances
with selector B(that is different from A), yurthub will get a conflict that whether to cache resources matching A or cache resources matching B for go-http-client
. Currently, we only remain the cache of first request. Thus when the second request comes, it will be refused.
Now, lets come to the solution. Firstly, I've to say that it seems cannot be solved just through configuring. We should check if the virt-handler
really need list/watch virtualmachineinstances
with different selector. I saw that we have virt-handler
and virt-api
running on the same node. We should check:
- do they both use
go-http-client
sending request to yurthub - if 1 is true, we need to check if they list/watch
virtualmachineinstances
but with different selectors.
If 2 is true, we can change the User-Agent
for virt-handler
and virt-api
to make them different. @joez
@Congrool Sorry for late, I spent some time to learn the kubevirt code, the conclusion is that, the current implementation of yurt-hub
can't support kubevirt very well, virt-handler needs support on same path with different selector
In the normal scenario, you can find logs in virt-handler
:
{"component":"virt-handler","level":"info","msg":"Starting virt-handler controller.","pos":"vm.go:1387"}
But you can't find it in the disconnected scenario, from the code pkg/virt-handler/vm.go:
func (c *VirtualMachineController) Run(threadiness int, stopCh chan struct{}) {
defer c.Queue.ShutDown()
log.Log.Info("Starting virt-handler controller.")
go c.deviceManagerController.Run(stopCh)
cache.WaitForCacheSync(stopCh, c.domainInformer.HasSynced, c.vmiSourceInformer.HasSynced, c.vmiTargetInformer.HasSynced, c.gracefulShutdownInformer.HasSynced)
...
VirtualMachineController
is created and run by virt-handler
, code at cmd/virt-handler/virt-handler.go
cmd/virt-handler/virt-handler.go
func (app *virtHandlerApp) Run() {
...
vmiSourceInformer := factory.VMISourceHost(app.HostOverride)
vmiTargetInformer := factory.VMITargetHost(app.HostOverride)
...
vmController := virthandler.NewController(
recorder,
app.virtCli,
app.HostOverride,
migrationIpAddress,
app.VirtShareDir,
app.VirtPrivateDir,
vmiSourceInformer,
vmiTargetInformer,
domainSharedInformer,
gracefulShutdownInformer,
...
cache.WaitForCacheSync(stop, vmiSourceInformer.HasSynced, factory.CRD().HasSynced)
go vmController.Run(10, stop)
In the disconnected scenario, virt-handler
is blocked at cache.WaitForCacheSync
, that is why the VM is not launched.
Both vmiSourceInformer
and vmiTargetInformer
should be success, but they have the same target path, only the selector is different, related code at pkg/controller/virtinformers.go:
func (f *kubeInformerFactory) VMISourceHost(hostName string) cache.SharedIndexInformer {
labelSelector, err := labels.Parse(fmt.Sprintf(kubev1.NodeNameLabel+" in (%s)", hostName))
if err != nil {
panic(err)
}
return f.getInformer("vmiInformer-sources", func() cache.SharedIndexInformer {
lw := NewListWatchFromClient(f.restClient, "virtualmachineinstances", k8sv1.NamespaceAll, fields.Everything(), labelSelector)
return cache.NewSharedIndexInformer(lw, &kubev1.VirtualMachineInstance{}, f.defaultResync, cache.Indexers{
cache.NamespaceIndex: cache.MetaNamespaceIndexFunc,
"node": func(obj interface{}) (strings []string, e error) {
return []string{obj.(*kubev1.VirtualMachineInstance).Status.NodeName}, nil
},
})
})
}
func (f *kubeInformerFactory) VMITargetHost(hostName string) cache.SharedIndexInformer {
labelSelector, err := labels.Parse(fmt.Sprintf(kubev1.MigrationTargetNodeNameLabel+" in (%s)", hostName))
if err != nil {
panic(err)
}
return f.getInformer("vmiInformer-targets", func() cache.SharedIndexInformer {
lw := NewListWatchFromClient(f.restClient, "virtualmachineinstances", k8sv1.NamespaceAll, fields.Everything(), labelSelector)
return cache.NewSharedIndexInformer(lw, &kubev1.VirtualMachineInstance{}, f.defaultResync, cache.Indexers{
cache.NamespaceIndex: cache.MetaNamespaceIndexFunc,
"node": func(obj interface{}) (strings []string, e error) {
return []string{obj.(*kubev1.VirtualMachineInstance).Status.NodeName}, nil
},
})
})
}
Details in the attached logs-openyurt-kubevirt.zip
the conclusion is that, the current implementation of yurt-hub can't support kubevirt very well, virt-handler needs support on same path with different selector
@joez Hi, I'm sorry to hear that. If you don't mind modifying the source code, there's still two solutions:
- enhance the cache capability of yurthub.
- split the client that list/watch
VirtualMachineInstance
in virt-handler, one client for each selector and assign different User-Agent for them, such asvirt-handler-MigrationTargetNodeNameLabel
andvirt-handler-NodeNameLabel
. Then yurthub can cache them respectively.
To quickly make it work arround, the option 2 is recommanded. Option 1 is somewhat hardly pushed forward, because we may need a refectoring on yurthub cache framework which is a big job. Anyway, it's the cache limitation that the community have already recognized, we need to come out a final solution to break it.
@Congrool Let me figure out which way is feasible, I will try option 2 first. This is a big challenge for me, because I have no programming experience for Kubernetes. May I know what is the main reason for the current yurt-hub
design? Shall we cache all the resources under a path and then filter on-the-fly when proxing, to support the current use case: same path with different selector.
May I know what is the main reason for the current yurt-hub design? Shall we cache all the resources under a path and then filter on-the-fly when proxing, to support the current use case: same path with different selector.
As far as I know, the original thought of yurthub is cache resources as least as possible considering the limited hardware resource of edge nodes. Thus, we separate cache for different components, and only cache some of them by default(e.g. kubelet, flannel, kube-proxy, coredns) which construct the minimal infrastructure set which business bases on(failure recovery, container network, service discovery, dns resolution, respectively). Other components, in this case, virt-handler, was not token into consideration in cloud-edge scenario.
But, hm, this feature emerged at the very early stage. Maybe @rambohe-ch can give more details.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
PR linked: #1614
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.