kubernetes-ansible icon indicating copy to clipboard operation
kubernetes-ansible copied to clipboard

Load minion definition into masters failure

Open vnugent opened this issue 10 years ago • 9 comments

Master: Fedora 21 Minion: Fedora 21 Atomic

TASK: [master | Enable scheduler] *********************************************
ok: [172.18.17.3]

TASK: [master | Copy v1beta3 style minion definitions to master] **************
ok: [172.18.17.3] => (item=172.18.17.18)

TASK: [master | Copy old v1beta1 style minion definitions to master] **********
skipping: [172.18.17.3] => (item=172.18.17.18)

TASK: [master | Load minion definition into masters] **************************
failed: [172.18.17.3] => (item=172.18.17.18) => {"changed": false, "cmd": ["/usr/bin/kubectl", "create", "-f", "/tmp/node-172.18.17.18.json"], "delta": "0:00:12.262144", "end": "2015-05-21 19:27:51.591692", "failed": true, "failed_when_result": true, "item": "172.18.17.18", "rc": 1, "start": "2015-05-21 19:27:39.329548", "stdout_lines": [], "warnings": []}
stderr: Error: 501: All the given peers are not reachable (failed to propose on members [http://172.18.17.3:4001] twice [last error: Unexpected HTTP status code]) [0]

FATAL: all hosts have already failed -- aborting

PLAY RECAP ********************************************************************
           to retry, use: --limit @/root/setup.retry

172.18.17.18               : ok=9    changed=0    unreachable=0    failed=0
172.18.17.3                : ok=26   changed=0    unreachable=0    failed=1

retried manually

[root@kmaster kubernetes-ansible]# /usr/bin/kubectl create -f /tmp/node-172.18.17.18.json
Error: 501: All the given peers are not reachable (failed to propose on members [http://172.18.17.3:4001] twice [last error: Unexpected HTTP status code]) [0]
[root@kmaster kubernetes-ansible]# curl http://172.18.17.3:4001
404 page not found
[root@kmaster kubernetes-ansible]# etcd -version
etcd version 2.0.9

vnugent avatar May 21 '15 19:05 vnugent

I was using the same hosts, master is Fedora21, minion is Fedora 21 Atomic, could not reproduce this problem. I guess it's your environment issue.

gouyang avatar May 27 '15 03:05 gouyang

Host is Fedora21, Minion is Fedora 21. Encountering the same issue on an openstack install. When I run the script installing flannel it hangs when trying to run it as well.

TASK: [master | Copy v1beta3 style minion definitions to master] ************** ok: [173.39.214.135] => (item=173.39.214.146) ok: [173.39.214.135] => (item=173.39.214.150)

TASK: [master | Copy old v1beta1 style minion definitions to master] ********** skipping: [173.39.214.135] => (item=173.39.214.146) skipping: [173.39.214.135] => (item=173.39.214.150)

TASK: [master | Load minion definition into masters] ************************** failed: [173.39.214.135] => (item=173.39.214.146) => {"changed": false, "cmd": ["/usr/bin/kubectl", "create", "-f", "/tmp/node-173.39.214.146.json"], "delta": "0:00:12.243381", "end": "2015-05-29 23:42:00.461505", "failed": true, "failed_when_result": true, "item": "173.39.214.146", "rc": 1, "start": "2015-05-29 23:41:48.218124", "stdout_lines": [], "warnings": []} stderr: Error: 501: All the given peers are not reachable (failed to propose on members [http://173.39.214.135:4001] twice [last error: Unexpected HTTP status code]) [0] failed: [173.39.214.135] => (item=173.39.214.150) => {"changed": false, "cmd": ["/usr/bin/kubectl", "create", "-f", "/tmp/node-173.39.214.150.json"], "delta": "0:00:12.243127", "end": "2015-05-29 23:42:13.004583", "failed": true, "failed_when_result": true, "item": "173.39.214.150", "rc": 1, "start": "2015-05-29 23:42:00.761456", "stdout_lines": [], "warnings": []} stderr: Error: 501: All the given peers are not reachable (failed to propose on members [http://173.39.214.135:4001] twice [last error: Unexpected HTTP status code]) [0]

FATAL: all hosts have already failed -- aborting

PLAY RECAP ******************************************************************** to retry, use: --limit @/root/setup.retry

173.39.214.135 : ok=26 changed=0 unreachable=0 failed=1
173.39.214.146 : ok=8 changed=0 unreachable=0 failed=0
173.39.214.150 : ok=8 changed=0 unreachable=0 failed=0

peterlamar avatar May 29 '15 23:05 peterlamar

Is etcd running? I just pushed an update to fix the problem where etcd 2.0.11 refused to start with the config we were supplying. Hopefully you can just update your git repo and rerun the setup.

eparis avatar May 30 '15 14:05 eparis

Its not.. Also, its not with the new changes. What would you suggest? Start over and install everything manually? It could be an odd issue with the Openstack I am using and I am open to any advice discovering it.

peterlamar avatar May 30 '15 22:05 peterlamar

try to start etcd systemctl start etcd see if it is running ps -ef | grep etcd collect the log journalctl -b -u etcd

or try running etcd by hand an see what it say /usr/bin/etcd

eparis avatar May 30 '15 22:05 eparis

I get a bunch of these.. something odd is going on

May 31 02:26:23 kmaster.novalocal etcd[600]: 2015/05/31 02:26:23 etcdserver: publish error: etcdserver: request timed out May 31 02:26:28 kmaster.novalocal etcd[600]: 2015/05/31 02:26:28 etcdserver: publish error: etcdserver: request timed out May 31 02:26:33 kmaster.novalocal etcd[600]: 2015/05/31 02:26:33 etcdserver: publish error: etcdserver: request timed out May 31 02:26:38 kmaster.novalocal etcd[600]: 2015/05/31 02:26:38 etcdserver: publish error: etcdserver: request timed out May 31 02:26:43 kmaster.novalocal etcd[600]: 2015/05/31 02:26:43 etcdserver: publish error: etcdserver: request timed out May 31 02:26:49 kmaster.novalocal etcd[600]: 2015/05/31 02:26:48 etcdserver: publish error: etcdserver: request timed out May 31 02:26:53 kmaster.novalocal etcd[600]: 2015/05/31 02:26:53 etcdserver: publish error: etcdserver: request timed out May 31 02:26:58 kmaster.novalocal etcd[600]: 2015/05/31 02:26:58 etcdserver: publish error: etcdserver: request timed out May 31 02:27:03 kmaster.novalocal etcd[600]: 2015/05/31 02:27:03 etcdserver: publish error: etcdserver: request timed out May 31 02:27:08 kmaster.novalocal etcd[600]: 2015/05/31 02:27:08 etcdserver: publish error: etcdserver: request timed out

peterlamar avatar May 31 '15 02:05 peterlamar

might be right to ask what those mean over in the http://github.com/coreos/etcd project. I've never seen them...

eparis avatar May 31 '15 03:05 eparis

@PeterLamar I guess that the etcd already initialized as member before, if so, please run sudo rm -fr /var/lib/etcd/default.etcd and restart your etcd service, I think it can solve your problem.

gouyang avatar Jun 02 '15 08:06 gouyang

OP here. I'm on OpenStack as well. @gouyang rm -fr /var/lib/etcd/default.etcd didn't help

# curl http://172.18.17.3:4001/version
etcd 2.0.9

vnugent avatar Jun 02 '15 15:06 vnugent