openshift-on-openstack
openshift-on-openstack copied to clipboard
Heat stuck at bastion
Hi Everybody,
I am trying to deploy OCP 3.5 (even 3.7) on OSP 11 from Red Hat. When I run the heat script, it does create the stack, all the necessary networks are created and creates the bastion and does the usual cloud-init provisioning steps (adding repos, updating, installation basic packages) and cloud init send the finished signal and get the HTTP 200.
After that, it get stuck at
$> openstack stack resource list -n 2 ocp2 | grep -i progress
| bastion_host | 98bd1fee-87c3-4360-bd4b-549e39d1345e | file:///Users/myself/projects/openshift-on-openstack/bastion.yaml | CREATE_IN_PROGRESS | 2017-12-21T16:00:41Z | ocp2 |
| deployment_write_templates | c8be1435-3125-4e06-8234-b620dd556fa8 | OS::Heat::SoftwareDeployment | CREATE_IN_PROGRESS | 2017-12-21T16:01:12Z | ocp2-bastion_host-n4vsl5fz4maw |
| deployment_update_node_count | 79327e5c-579d-4a95-a0b4-e93c52385afd | OS::Heat::SoftwareDeployment | CREATE_IN_PROGRESS | 2017-12-21T16:01:12Z | ocp2-bastion_host-n4vsl5fz4maw |
| deployment_tune_ansible | a705f997-3cf0-44aa-90f1-af21e3a23ca1 | OS::Heat::SoftwareDeployment
If I force the signal with openstack heat resource signal ... it goes to the next step but I see that the ansible template isn't create and the usual pushed files aren't present.
The /etc/os-collect-config.conf points to the good endpoint:
$> cat /etc/os-collect-config.conf
[DEFAULT]
command = os-refresh-config
collectors = ec2
collectors = cfn
collectors = local
[cfn]
metadata_url = https://10.1.3.11:13005/v1/
stack_name = ocp2-bastion_host-n4vsl5fz4maw
secret_access_key = 7e7214750d1a48c9a4cad81010fe2173
access_key_id = 494ab1ed83b441168423aec7d868267c
path = host.Metadata
$> openstack endpoint list | grep heat
| 1b24a4cf65a74e38992c4d8230a6e7da | regionOne | heat-cfn | cloudformation | True | internal | http://172.17.1.16:8000/v1 |
| 2f666c5f3f25445682d8cc6ca51f9488 | regionOne | heat | orchestration | True | admin | http://172.17.1.16:8004/v1/%(tenant_id)s |
| 557a1fc9ff2549a8bc142bd305ac26bb | regionOne | heat-cfn | cloudformation | True | public | https://10.1.3.11:13005/v1 |
| 622df692e35b424b93cd24f54c577df4 | regionOne | heat | orchestration | True | public | https://10.1.3.11:13004/v1/%(tenant_id)s |
| da4ed879390b4b6c9d97e114aa011f49 | regionOne | heat | orchestration | True | internal | http://172.17.1.16:8004/v1/%(tenant_id)s |
| fba19a090ed6437f86513a91e9cdc0ba | regionOne | heat-cfn | cloudformation | True | admin | http://172.17.1.16:8000/v1
After few hours, it times out and the stack is failed.
Does anyone might have a clue why?
Thanks a lot for your support P.
parameters.yaml
parameters:
ssh_key_name: myself
bastion_image: rhel-guest-image-7.2-20160302.0.x86_64
bastion_flavor: m1.medium
master_image: rhel-guest-image-7.2-20160302.0.x86_64
master_flavor: m1.medium
infra_image: rhel-atomic-cloud-7.2-10.x86_64
infra_flavor: m1.medium
node_image: rhel-atomic-cloud-7.2-10.x86_64
node_flavor: m1.medium
loadbalancer_image: rhel-atomic-cloud-7.2-10.x86_64
loadbalancer_flavor: m1.medium
ocp_version: 3.5
osp_version: 11
external_network: internet_access
container_subnet: 192.168.1.0/24
loadbalancer_type: neutron
dns_nameserver: 8.8.4.4,8.8.8.8
node_count: 2
rhn_username: ""
rhn_password: "."
rhn_pool: ""
extra_rhn_pools: ""
deployment_type: openshift-enterprise
domain_name: "example.com"
master_hostname: "openshift-master"
node_hostname: "openshift-node"
ssh_user: cloud-user
master_docker_volume_size_gb: 25
infra_docker_volume_size_gb: 25
node_docker_volume_size_gb: 25
system_update: false
resource_registry:
#OOShift::LoadBalancer: ../openshift-on-openstack/loadbalancer_dedicated.yaml
OOShift::LoadBalancer: ../openshift-on-openstack/loadbalancer_neutron.yaml
OOShift::ContainerPort: ../openshift-on-openstack/sdn_openshift_sdn.yaml
OOShift::IPFailover: ../openshift-on-openstack/ipfailover_keepalived.yaml
OOShift::DockerVolume: ../openshift-on-openstack/volume_docker.yaml
OOShift::DockerVolumeAttachment: ../openshift-on-openstack/volume_attachment_docker.yaml
OOShift::RegistryVolume: ../openshift-on-openstack/registry_ephemeral.yaml
@pburgisser - Did you ever figure out this problem? I seem to be stuck in exactly the same place...
-Andy
I have the same issue.
What I've noticed is that the wait_handle in bastion.yaml is not set up until after the success signal is sent by fragments/bastion-boot.sh. I can see this in /var/log/containers/heat/heat-engine.log on the controller node(s). Moving the order of wait_condition resource to the top helps but I haven't worked out the exact dependencies to make it work properly yet.
@daleking - Thanks for the info. What I have done is gone over to openshift-on-openstack-123 and have made it a bunch further. Of course I had to flail about wildly. I may come back to this problem once I get over the hump.
Hey folks, I'm really sorry but none of the past maintainers of this repo are able to dedicate much time to it (including myself).
The good news is that the openshift-ansible project (the main OpenShift installer -- this repo uses it under the hood, too) now includes playbooks for various cloud providers including OpenStack:
https://github.com/openshift/openshift-ansible/tree/master/playbooks/openstack
If it helps any, this is what most Red Hat engineers involved with running OpenShift on OpenStack these days are working on.
I'll update the readme to reflect this, but in the meantime, this project is not really maintained anymore.
Hi Doc,
May be I can help you.
My setup is RH OCP 3.7 on OSP 12 RHEL 7.5 It looks like ready signal not back to your stack engine.
- Wich VM are deployd already? Bastion, master, infra
- If bastion deployd and if you can login via vip ip of OSP console than check /var/log/cloud-init-output.log, search in that file ‘part-0’ if you see e.g. part-012 it means cloud-init user-data script part-012 has some trouble and it is not executed fully, you can find those files /var/lib/cloud/instance/scripts/, those are linux commandos thus check ech commado if its exe well
- Is OSP api work TLS if so do you have server certificate in your bastion host, send curl command to heat-cfn endpoint then you will know
- Check if all packages are installed
OK, solved my issue - the WaitCondition signals were OK but the heat agents were not installed in my cloud image (official RedHat 7.5) so the SoftwareDeployment steps were not being run.
The following work around ensures that openstack-heat-agents is installed so that the OS::Heat::SoftwareDeployment tasks do not time out:
https://github.com/daleking/openshift-on-openstack/commit/475e997628fe8af047ddda1fb57e051f747099a1