coreos-kubernetes icon indicating copy to clipboard operation
coreos-kubernetes copied to clipboard

controller node fails to start periodically because flannel doesn't start.

Open gamykla opened this issue 8 years ago • 5 comments

after kube-aws up the controller node fails to start properly because of some flannel error cluster.yaml.txt journalctl.txt

gamykla avatar Oct 04 '16 19:10 gamykla

Hi Jelis,

Can you also provide which version of CoreOS this is happening on? Looking at the logs i'm not sure what is causing this but it could be from a recent OS change.

peebs avatar Oct 04 '16 21:10 peebs

@pbx0 $ kube-aws version kube-aws version v0.8.2

from cluster.yaml releaseChannel: stable kubernetesVersion: v1.4.0_coreos.1

core@ip-10-0-0-50 ~ $ cat /etc/os-release NAME=CoreOS ID=coreos VERSION=1122.2.0 VERSION_ID=1122.2.0 BUILD_ID=2016-09-06-1449 PRETTY_NAME="CoreOS 1122.2.0 (MoreOS)" ANSI_COLOR="1;32" HOME_URL="https://coreos.com/" BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

gamykla avatar Oct 05 '16 12:10 gamykla

For the hell of it, i tried to reboot the controller node and now the docker containers are up.

gamykla avatar Oct 05 '16 12:10 gamykla

@pbx0 i created another cluster today. This is 100% reproducible... started cluster. logged into controller node. no containers running. sudo shutdown -r now ... log in again and its all good.....

gamykla avatar Oct 06 '16 13:10 gamykla

Sorry I have not had the time to look into reproducing this but it could be the same issue as https://github.com/coreos/bugs/issues/1393

This means that if flannel is failing to start even once (but then it might crashloop into a sucessful start) it could cause docker to cease running until it is manually restarted.

peebs avatar Oct 13 '16 23:10 peebs