coreos-kubernetes
coreos-kubernetes copied to clipboard
controller node fails to start periodically because flannel doesn't start.
after kube-aws up the controller node fails to start properly because of some flannel error cluster.yaml.txt journalctl.txt
Hi Jelis,
Can you also provide which version of CoreOS this is happening on? Looking at the logs i'm not sure what is causing this but it could be from a recent OS change.
@pbx0 $ kube-aws version kube-aws version v0.8.2
from cluster.yaml releaseChannel: stable kubernetesVersion: v1.4.0_coreos.1
core@ip-10-0-0-50 ~ $ cat /etc/os-release NAME=CoreOS ID=coreos VERSION=1122.2.0 VERSION_ID=1122.2.0 BUILD_ID=2016-09-06-1449 PRETTY_NAME="CoreOS 1122.2.0 (MoreOS)" ANSI_COLOR="1;32" HOME_URL="https://coreos.com/" BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
For the hell of it, i tried to reboot the controller node and now the docker containers are up.
@pbx0 i created another cluster today. This is 100% reproducible... started cluster. logged into controller node. no containers running. sudo shutdown -r now ... log in again and its all good.....
Sorry I have not had the time to look into reproducing this but it could be the same issue as https://github.com/coreos/bugs/issues/1393
This means that if flannel is failing to start even once (but then it might crashloop into a sucessful start) it could cause docker to cease running until it is manually restarted.