installcentos
installcentos copied to clipboard
3.11 deployment issue
TASK [openshift_control_plane : Wait for all control plane pods to become ready] ********************************************************* FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (57 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (56 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (55 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (54 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (53 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (52 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (51 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (50 retries left). ok: [10.0.1.31] => (item=etcd) FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left). ok: [10.0.1.31] => (item=api) FAILED - RETRYING: Wait for all control plane pods to become ready (60 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (59 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (58 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (57 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (56 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (55 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (54 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (53 retries left). FAILED - RETRYING: Wait for all control plane pods to become ready (52 retries left).
TASK [openshift_node_group : Wait for the sync daemonset to become ready and available] ************************************************** FAILED - RETRYING: Wait for the sync daemonset to become ready and available (60 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (59 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (58 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (57 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (56 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (55 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (54 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (53 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (52 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (51 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (50 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (49 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (48 retries left). FAILED - RETRYING: Wait for the sync daemonset to become ready and available (47 retries left).
Any chance you have Ansible 2.7?
@gshipley, The dreaded error has appeared to me...the "wait for control plane pods to appear". When i run "journalctl -flu docker.service" on another ssh session i get:
Oct 21 08:39:44 optung.vm.local oci-umount[59912]: umounthook
It keeps repeating the block above, the only difference is that level=warning msg="xxx" cleanup changes the id (where "xxx" is the ID) also when it gets to the last retry it shows the following message before starting all 60 retries:
failed: [10.84.51.10] (item=etcd) => {"attempts": 60, "changed": false, "item": "etcd", "msg": {"cmd": "/usr/bin/oc get pod master-etcd-optung.vm.local -o json -n kube-system", "results": [{}], "returncode": 1, "stderr": "The connection to the server optung.vm.local:8443 was refused - did you specify the right host or port?\n", "stdout": ""}}
The vm has been created with: 8 cores(core i7) 16GB RAM 300GB hard drive(SSD hard drive) ansible version is that one from the script. I touched nothing on the scripts, are you able to help?
Can you check the logs whether the system complains about not being able to create certificates?
Looks like it's the correct version, 2.6.5.
Installing : ansible-2.6.5-1.el7.ans.noarch 6/6
Hey guys, i found out my problem...for some reason during the installation ansible was being updated to version 2.7, which doesn´t make any sense because of these 2 lines on the script: curl -o ansible.rpm https://releases.ansible.com/ansible/rpm/release/epel-7-x86_64/ansible-2.6.5-1.el7.ans.noarch.rpm yum -y --enablerepo=epel install ansible.rpm At first i tought that i had installed ansible on the system before running the script, so i went drastic and installed a Centos 7.5 minimal from scratch...it happened again. what i did to solve it was to add the line yum remove ansible before those 2 lines installing ansible and it is now working as intended. Weird stuff though. Do any of you by any means know if opencontrail/tungsten Fabric support is officially added on Origin/OKD??
Post-install, mine is still 2.6.5.
ansible --version ansible 2.6.5 config file = /etc/ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Jul 13 2018, 13:06:57) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]
I'd happily send the logs, but it seems like the logging location changes with each version of OpenShift, so I'm not sure where to look and Google isn't helping.
@ryannix123 Why do lumberjacks get frustrated with OpenShift?
Answer: Because they can never find the logs.
Okay, okay - a Dad joke for sure. We are working on the logging situation and much improvement will happen in the 4.0 release.
@fclaudiopalmeira so far the only reason I have encountered for control plane failing with these messages are incorrect certificates caused by 2.7 Ansible
@marekjelen My certificates were OK, the ansible version however, was not, I am inclined to believe that whenever you have ansible 2.7 installed weird stuff will happen! But, luckily i got past that error, and now i´m dealing with another one, which is related to git, when itry to create an APP i´m gettiing: error: fatal: unable to access 'https://github.com/gshipley/simplephp/': The requested URL returned error: 503 That started happening after I setup the GIT_SSL_NO_VERIFY = true env var (if i don´t, it gives me "the Peer's certificate issuer has been marked as not trusted by the user" ) But, so far i had no luck in finding out a solution!
well...no luck at all with this certificate stuff...anyone could help?
@ryannix123 rerunning the setup script and all the control pods come up just fine. Can you go to the docker
level (docker ps
, docker logs
) and check what containers are failing? and extract some logs?
@fclaudiopalmeira can you provide more info how are you trying to deploy the app?
Have tried to clone the repo on the machine
as well as deploy the app on OpenShift
and both seem to work ...
Hey @marekjelen I was trying to deploy it by following exactly the youtube video(from openshift dahsboard)
hmm, that is the 2nd picture @fclaudiopalmeira and it worked fine on a cluster I have just provisioned.
you can alter the ansible version in the installation script from 2.6.x to 2.7.1.1 as a temporary workaround.
Please attach the inventory and output with ansible-playbook -vvv
.
Sync daemonset might fail if some nodes haven't applied their configuration, so oc describe nodes
output would be handy too
I have fixed this is by doing following steps.
- yum remove atomic-openshift* (On all node)
- yum install atomic-openshift* (On all node)
- mv /etc/origin /etc/origin.old
- mv /etc/kubernetes /etc/kubernetes.old
- mv ~/.kube/config /tmp/kube_config_backup
ansible-playbook -i /tmp/test /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
Please let me know if that works for you.
if above step doesnt work then update vi /usr/share/ansible/openshift-ansible/roles/openshift_control_plane/tasks/main.yml ###REPLACE THIS WITH BELOW - "{{ 'etcd' if inventory_hostname in groups['oo_etcd_to_config'] else omit }}"
- "{{ 'etcd' if (inventory_hostname in groups['oo_etcd_to_config'] and inventory_hostname in groups['oo_masters_to_config']) else '' }}"
Still no luck, same issue
can you paste me the exact error and have tied both way?
Looks like these deployments are going to radically change in OpenShift 4: https://www.youtube.com/watch?v=-xJIvBpvEeE
well...no luck at all with this certificate stuff...anyone could help?
@fclaudiopalmeira - have you found a solution to the certificate issue?