cnf-testbed
cnf-testbed copied to clipboard
Issue with Kubespray v2.14.0 (likely related to Multus configuration)
Link to issue with details: https://github.com/intel/multus-cni/issues/561
CNF Testbed (Kubespray) runs without errors, but inspecting the cluster afterwards show the following:
$ kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-kube-controllers-b5c94f8f8-2j5fd 1/1 Running 0 43m
kube-system pod/calico-node-b4kmp 1/1 Running 0 45m
kube-system pod/calico-node-cm4x4 1/1 Running 0 45m
kube-system pod/coredns-dff8fc7d-xps6q 0/1 ContainerCreating 0 42m
kube-system pod/dns-autoscaler-6fcd794dd8-88mb4 0/1 ContainerCreating 0 42m
kube-system pod/kube-apiserver-node0 1/1 Running 0 49m
kube-system pod/kube-controller-manager-node0 1/1 Running 0 49m
kube-system pod/kube-multus-ds-amd64-b6dvr 1/1 Running 0 44m
kube-system pod/kube-multus-ds-amd64-pdpdn 1/1 Running 0 44m
kube-system pod/kube-proxy-57vbc 1/1 Running 0 49m
kube-system pod/kube-proxy-x89hc 1/1 Running 0 46m
kube-system pod/kube-scheduler-node0 1/1 Running 0 49m
kube-system pod/kubernetes-dashboard-667c4c65f8-8474b 0/1 ContainerCreating 0 42m
kube-system pod/kubernetes-metrics-scraper-54fbb4d595-df2ns 0/1 ContainerCreating 0 42m
kube-system pod/nginx-proxy-node1 1/1 Running 0 45m
kube-system pod/nodelocaldns-859qq 1/1 Running 0 42m
kube-system pod/nodelocaldns-gmhdm 1/1 Running 0 42m
Issue seems to be Kubespray deploying Multus with --cni-version=0.4.0
, which should be supported, but it looks like it isn't.
Workaround:
With the cluster deployed, update the Multus daemonset:
$ kubectl edit ds kube-multus-ds-amd64 -n kube-system
Update the arguments to use --cni-version=0.3.1
instead of --cni-version=0.4.0
as shown below:
- args:
- --cni-conf-dir=/host/etc/cni/net.d
- --cni-bin-dir=/host/opt/cni/bin
- --multus-conf-file=auto
- --multus-kubeconfig-file-host=/etc/cni/net.d/multus.d/multus.kubeconfig
- --cni-version=0.3.1
Save the file and wait for the update to propagate. After that all of the above pods should end up in state "Running"
Hey @michaelspedersen, I did a quick test using a similar setup. This All-in-One cluster uses Kubespray v2.14.0 and Kubernetes v1.18.8.
Command:
/entrypoint.sh
Args:
--cni-conf-dir=/host/etc/cni/net.d
--cni-bin-dir=/host/opt/cni/bin
--multus-conf-file=auto
--multus-kubeconfig-file-host=/etc/cni/net.d/multus.d/multus.kubeconfig
--cni-version=0.4.0
This is the k8s-cluster.yml
file
---
system_namespace: kube-system
kube_log_dir: "/var/log/kubernetes"
kube_api_anonymous_auth: true
kube_api_pwd: "secret"
kube_users:
kube:
pass: "{{ kube_api_pwd }}"
role: admin
groups:
- system:masters
kube_basic_auth: false
kube_token_auth: false
kube_network_plugin: flannel
kubeconfig_localhost: true
kube_version: v1.18.8
kube_proxy_mode: iptables
download_run_once: true
local_release_dir: "/tmp/releases"
helm_enabled: false
local_volumes_enabled: true
local_volume_provisioner_enabled: true
download_localhost: true
kube_network_plugin_multus: true
kubectl_localhost: false
etcd_deployment_type: docker
kubelet_deployment_type: docker
container_manager: docker
kubelet_custom_flags:
- "--cpu-manager-policy=static" # Allows containers in Guaranteed pods with integer CPU requests access to exclusive CPUs on the node.
kubelet_flexvolumes_plugins_dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
dashboard_skip_login: true
cert_manager_enabled: true
ingress_nginx_enabled: true
kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-578cd6d964-wwk2f 1/1 Running 0 71m
cert-manager cert-manager-cainjector-5ffff9dd7c-26mwk 1/1 Running 0 71m
cert-manager cert-manager-webhook-556b9d7dfd-g2nwd 1/1 Running 0 71m
ingress-nginx ingress-nginx-controller-9gs5k 1/1 Running 0 71m
kube-system coredns-dff8fc7d-qb928 1/1 Running 0 70m
kube-system coredns-dff8fc7d-s75cb 0/1 Pending 0 70m
kube-system dns-autoscaler-66498f5c5f-6rrkn 1/1 Running 0 70m
kube-system kube-apiserver-aio 1/1 Running 0 72m
kube-system kube-controller-manager-aio 1/1 Running 0 72m
kube-system kube-flannel-mbqfs 1/1 Running 0 71m
kube-system kube-multus-ds-amd64-bmt99 1/1 Running 0 71m
kube-system kube-proxy-mnfdw 1/1 Running 0 71m
kube-system kube-scheduler-aio 1/1 Running 0 72m
kube-system kubernetes-dashboard-5697dbd455-7fb2c 1/1 Running 0 70m
kube-system kubernetes-metrics-scraper-54fbb4d595-6jfk7 1/1 Running 0 70m
kube-system local-volume-provisioner-p7cvs 1/1 Running 0 70m
kube-system nodelocaldns-54s5c 1/1 Running 0 70m
But my multus test worked
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: multus-deployment
labels:
app: multus
spec:
replicas: 1
selector:
matchLabels:
app: multus
template:
metadata:
labels:
app: multus
annotations:
k8s.v1.cni.cncf.io/networks: '[
{ "name": "bridge-conf", "interfaceRequest": "eth1" },
{ "name": "bridge-conf", "interfaceRequest": "eth2" }
]'
spec:
containers:
- name: multus-deployment
image: "busybox"
command: ["top"]
stdin: true
tty: true
EOF
$ kubectl exec -ti multus-deployment-858f78c94d-r5sdl -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
3: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
link/ether 12:d5:47:0a:e2:e5 brd ff:ff:ff:ff:ff:ff
inet 10.233.64.12/24 brd 10.233.64.255 scope global eth0
valid_lft forever preferred_lft forever
5: eth1@if21: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
link/ether 0a:ce:a5:59:2c:3f brd ff:ff:ff:ff:ff:ff
inet 10.10.0.4/16 brd 10.10.255.255 scope global eth1
valid_lft forever preferred_lft forever
7: eth2@if22: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
link/ether 0e:f0:3c:88:aa:4d brd ff:ff:ff:ff:ff:ff
inet 10.10.0.5/16 brd 10.10.255.255 scope global eth2
valid_lft forever preferred_lft forever
Hmm, that's interesting @electrocucaracha (and thanks for also checking up on this).
Did you check the status of coredns-dff8fc7d-s75cb
. Looks like it is still pending in the above snapshot?
Yeah, I'm not sure why that happens but I have seen that behavior in All-in-One setups.
Workaround being applied through https://github.com/crosscloudci/k8s-infra/pull/20