kubernetes-vagrant-coreos-cluster Node become NotReady after vagrant reload

Hi, thanks for your work, I'm using the latest version of this repo and can worked when vagrant up, but after vagrant reload, node-01 and node-02 become not ready, and I found the log of kubelet container in node-02:

E0510 11:47:24.151857    1236 event.go:209] Unable to write event: 'Post https://__MASTER_IP__:443/api/v1/namespaces/default/events: dial tcp: lookup __MASTER_IP__ on 10.0.2.3:53: server misbehaving' (may retry after sleeping)

E0510 11:47:24.363225    1236 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://__MASTER_IP__:443/api/v1/services?limit=500&resourceVersion=0: dial tcp: lookup __MASTER_IP__ on 10.0.2.3:53: server misbehaving

It seems that the var is not replaced by the real value.

May 10 '18 11:05 liubin

@liubin I just tried with the following instructions and everything seems OK:

$ NODES=2 vagrant halt
$ NODES=2 vagrant up

This is equivalent to NODES=2 vagrant reload. Can you please provide the exact instructions you followed since creating the cluster?

May 10 '18 12:05 bmcustodio

I only did some vagrant reload or vagrant halt & vagrant up.

May 14 '18 06:05 liubin

After some watches, I think the problem may be that the kubelet container started earlier than the MASTER_IP's replace.

I cant see the file /etc/kubernetes/node-kubeconfig.yaml has the correct ip of master, but kubelet's log show that it is still using the MASTER_IP, after restart the kubelet by docker restartt kubelet, the node becomes ready status.

May 14 '18 09:05 liubin