kubernetes-ovn-heterogeneous-cluster
kubernetes-ovn-heterogeneous-cluster copied to clipboard
Error when running install_ovn.ps1 (windows-init.exe)
I'm getting the following error when running the install_ovn.ps1 script on the windows host. I'm fairly sure my settings are correct. Does K8S_CLUSTER_ROUTER need to be defined somewhere?
Traceback (most recent call last):
File "windows-init.py", line 151, in <module>
File "windows-init.py", line 147, in minion_init
File "windows-init.py", line 43, in create_management_port
File "windows-init.py", line 26, in get_k8s_cluster_router
Exception: K8S_CLUSTER_ROUTER not found
@aserdean any changes that have not been properly propagated to the zip file hosted by Cloudbase?
Could this error be a symptom of a connection issue? Doesn't seem like it though, the error is returned immediately and not as a result of a timeout.
My setup is as such:
the machines can ping each other
on the master node (set up as the example):
export HOSTNAME=`hostname`
export K8S_VERSION=1.5.3
export K8S_POD_SUBNET=10.244.0.0/16
export K8S_NODE_POD_SUBNET=10.244.2.0/24
export K8S_DNS_SERVICE_IP=10.100.0.10
export K8S_DNS_DOMAIN=cluster.local
on the windows worker node:
$SUBNET="10.244.2.0/24" # The minion subnet used to spawn pods on
$GATEWAY_IP="10.244.2.1" # first ip of the subnet
$CLUSTER_IP_SUBNET="10.244.0.0/16" # The big subnet which includes the minions subnets
$INTERFACE_ALIAS="Ethernet" # Interface used for creating the overlay tunnels (must have connectivity with other hosts)
$KUBERNETES_API_SERVER="10.142.0.2" # API kubernetes server IP
$PUBLIC_IP="10.142.0.3" # IP of $INTERFACE_ALIAS (must be able to reach other hosts)
My guess is that you don't have a gateway node and maybe some late minute changes before GCP Next 2017 demo required it. @alinbalutoiu and @aserdean should know how to identify the root cause better than I do at this point.
The gateway node is listed as the last thing to set up in the doc. Does it possibly just need to be created before the worker nodes?
All binaries are unchanged at the moment.
We have another one in the works which allows further logging, but we did not properly test that one yet.
It halts the execution in: https://github.com/alinbalutoiu/ovn_alpha/blob/d60a50e440d9d17da320a8acba1766e3cff31b86/bin/ovn-k8s-overlay#L70-L77.
It either cannot find: https://github.com/alinbalutoiu/ovn_alpha/blob/d60a50e440d9d17da320a8acba1766e3cff31b86/bin/ovn-k8s-overlay#L271 or you are trying to do a gw init on the windows node.
I guess you ran the init scripts a couple of times which might messed up the config. Can you please show us the output of ovn-nbctl show and ovn-sbctl show on the master node?
I've only ran the init scripts once on the master node. On the windows node I have ran the script multiple times, but I have also deleted and recreated this node multiple times in a trial-and-error fashion trying different network values.
Output:
root@sig-windows-master:~# ovn-nbctl show
root@sig-windows-master:~# ovn-sbctl show
Chassis "3dfb3758-14a6-4b89-ab55-48d1eb391f84"
hostname: "sig-windows-worker-windows-1"
Encap geneve
ip: "10.142.0.3"
options: {csum="true"}
Chassis "919d7288-6885-4102-801b-db98cb3fcaf2"
hostname: "sig-windows-worker-windows-1"
Encap geneve
ip: "10.142.0.3"
options: {csum="true"}
Chassis "90f4881c-8740-4aa3-88fd-f140f67535de"
hostname: "sig-windows-worker-windows-1"
Encap geneve
ip: "10.142.0.3"
options: {csum="true"}
Chassis "63914194-9611-425d-b316-454463d3c6fd"
hostname: "sig-windows-worker-windows-1"
Encap geneve
ip: "10.142.0.3"
options: {csum="true"}
Chassis "49f459b9-a35e-44d9-aef8-82953b187e8b"
hostname: "sig-windows-worker-windows-1"
Encap geneve
ip: "10.142.0.3"
options: {csum="true"}
Chassis "049316d1-f697-4cfd-8753-6d6701b2a34e"
hostname: "sig-windows-worker-windows-1"
Encap geneve
ip: "10.142.0.3"
options: {csum="true"}
Chassis "e6e9f92c-5141-4b0c-b1df-a018d55a6aaa"
hostname: "sig-windows-master.xxxxxxxxxxxx.internal"
Encap geneve
ip: "10.142.0.2"
options: {csum="true"}
Chassis "240a4793-f40d-43eb-af6c-1bc6d0993cb3"
hostname: "sig-windows-worker-windows-1"
Encap geneve
ip: "10.142.0.3"
options: {csum="true"}
Chassis "4aaf794b-c3ed-45d3-8196-32d488f570f0"
hostname: "sig-windows-worker-windows-1"
Encap geneve
ip: "10.142.0.3"
options: {csum="true"}
Chassis "2f45c22d-8dbd-4e0a-9a53-e4ec0da9855e"
hostname: "sig-windows-worker-windows-1"
Encap geneve
ip: "10.142.0.3"
options: {csum="true"}
Thanks for the output :).
root@sig-windows-master:~# ovn-nbctl show should have shown you something.
Is the service running?
Can you post the logs from ovn-northd by any chance?
Also you can join #sig-windows if you wish, so we can go step by step when you recreate the env.
root@sig-windows-master:~# kubectl get nodes
NAME STATUS AGE
sig-windows-master Ready,SchedulingDisabled 22h
root@sig-windows-master:~# kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
kube-apiserver-sig-windows-master 1/1 Running 0 22h
kube-controller-manager-sig-windows-master 1/1 Running 0 22h
kube-dns-1216797708-9vl76 0/3 Pending 0 22h
kube-scheduler-sig-windows-master 1/1 Running 0 22h
root@sig-windows-master:~# ovn-northd
2017-03-17T12:52:43Z|00001|reconnect|INFO|unix:/var/run/openvswitch/ovnnb_db.sock: connecting...
2017-03-17T12:52:43Z|00002|reconnect|INFO|unix:/var/run/openvswitch/ovnsb_db.sock: connecting...
2017-03-17T12:52:43Z|00003|reconnect|INFO|unix:/var/run/openvswitch/ovnnb_db.sock: connected
2017-03-17T12:52:43Z|00004|reconnect|INFO|unix:/var/run/openvswitch/ovnsb_db.sock: connected
I've included my full bash history: bash.txt
Hello @greigs ! I think you missed the part with master-init on the master node, I do not see it in your bash history. At the end of the tutorial (https://github.com/apprenda/kubernetes-ovn-heterogeneous-cluster/blob/master/master/README.md) there is a part saying "After making sure the API server is up & running, you need to configure pod networking for this node". Could you please try to execute that part too?
Make sure you cleanup ovn-sbctl db first by running ovn-sbctl chassis-del <chassis_id>.
As an example for this:
Chassis "2f45c22d-8dbd-4e0a-9a53-e4ec0da9855e" hostname: "sig-windows-worker-windows-1" Encap geneve ip: "10.142.0.3" options: {csum="true"}
You should do ovn-sbctl chassis-del 2f45c22d-8dbd-4e0a-9a53-e4ec0da9855e. Repeat this for every chassis then you can go and execute the last part of the tutorial.
Ah! Thanks @alinbalutoiu
Looks as though copying and pasting that script stopped at the apt install -y python-pip command. Easy mistake to make but I should pay more attention.
I've finished running it now, and the windows node is no longer showing an error when running the install_ovn.ps1
This is promising. I'll continue the setup.
The nodes is being seen. I tried deploying the dashboard but it failed due to not matching "linux", which makes sense. So then I tried adding a linux worker node as described. It is shown in the node list correctly.
I then removed and re-added the dashboard deployment.
My error is now:
Failed to setup network for pod \"kubernetes-dashboard-3203962772-t2382_kube-system(203e8324-0b43-11e7-879b-42010a8e0002)\" using network plugins \"cni\": ; Skipping pod"
Full output attached: describe.txt
Linux node setup (i copied the .pem files from the master node before running): linuxnode.txt