awx-operator icon indicating copy to clipboard operation
awx-operator copied to clipboard

Unable to create awx-service

Open kaushik4r opened this issue 2 years ago • 9 comments

ISSUE TYPE
  • Bug Report
SUMMARY

The ansible task-runner fails while trying to create awx-service.

ENVIRONMENT
  • AWX version: 20.0.0
  • Operator version: 0.20.0
  • Kubernetes version: 1.21.0
  • AWX install method: Kubernetes
STEPS TO REPRODUCE

Ansible task runner fails at "TASK [installer : Apply Resources]" after creating configmap, awx-app-credentials(secret), awx(service account) and persistentVolumeClaim. Therefore the awx-service doesn't get created.

EXPECTED RESULTS

awx-service and ingress should be created.

ACTUAL RESULTS

awx-service and ingress not created.

ADDITIONAL INFORMATION

Also when I tried setting the no_log to false, the Ansible task runner still outputs true as the value. Therefore, I'm not able to retrieve the actual error.

AWX-OPERATOR LOGS

TASK [installer : Apply Resources] *********************************************\r\ntask path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:20\n ok: [localhost] => (item=None) => {\"censored\": \"the output has been hidden due to the fact that 'no_log: true' was specified for this result\", \"changed\": false}\n ok: [localhost] => (item=None) => {\"censored\": \"the output has been hidden due to the fact that 'no_log: true' was specified for this result\", \"changed\": false}\n ok: [localhost] => (item=None) => {\"censored\": \"the output has been hidden due to the fact that 'no_log: true' was specified for this result\", \"changed\": false}\n ok: [localhost] => (item=None) => {\"censored\": \"the output has been hidden due to the fact that 'no_log: true' was specified for this result\", \"changed\": false}\n failed: [localhost] (item=None) => {\"censored\": \"the output has been hidden due to the fact that 'no_log: true' was specified for this result\", \"changed\": false}\n failed: [localhost] (item=None) => {\"censored\": \"the output has been hidden due to the fact that 'no_log: true' was specified for this result\", \"changed\": false}\n fatal: [localhost]: FAILED! => {\"censored\": \"the output has been hidden due to the fact that 'no_log: true' was specified for this result\", \"changed\": false}\n localhost : ok=35 changed=0 unreachable=0 failed=1 skipped=31 rescued=0 ignored=0 \r\n\r\n\r\nPLAY RECAP *********************************************************************\r\nlocalhost : ok=35 changed=0 unreachable=0 failed=1 skipped=31 rescued=0 ignored=0 \r\n\n","job":"1360943380165754576","name":"awx-operator-awx","namespace":"awx","error":"exit status 2"}

kaushik4r avatar May 10 '22 21:05 kaushik4r

I'm running into the same issue since yesterday. Before that - everything just worked out of the box.

dm-sumup avatar May 11 '22 17:05 dm-sumup

@kaushik4r would you mind trying this on our latest awx operator release? https://github.com/ansible/awx-operator/releases/tag/0.21.0

VERSION=latest make deploy

fosterseth avatar May 11 '22 17:05 fosterseth

Tried it but didn't help. Is there a way to set no_logs = false on awx-manager, so that I could see where the problem is?

dm-sumup avatar May 11 '22 18:05 dm-sumup

Looks like I found the issue, at least on my side. I was setting this up on AWS EKS. Looks like if you add ALB to cluster before awx-operator and awx - then operator is not able to launch awx pods.

But documentation on how to change no_logs = false on awx-manager would be helpfull anyhow

dm-sumup avatar May 11 '22 20:05 dm-sumup

I'm still facing this issue. Can someone please advise as to how no_log can be set to false for the awx-manager?

kaushik4r avatar May 12 '22 17:05 kaushik4r

Hi @kaushik4r You can download the sourcecode from github do local modification such as no_log = false and deploy the operator with an ansible-playbook. More info about debugging is over here -> https://github.com/ansible/awx-operator/blob/devel/docs/debugging.md

PaulVerhoeven1 avatar May 13 '22 06:05 PaulVerhoeven1

This issue may be well suited for the awx mailing list, so you might create a post there as well https://groups.google.com/g/awx-project

fosterseth avatar May 13 '22 18:05 fosterseth

Can someone please advise as to how I can set no_log to false for the operator pod? https://github.com/ansible/awx-operator/blob/devel/docs/debugging.md isn't very helpful.

kaushik4r avatar May 17 '22 01:05 kaushik4r

Takes a few steps, but I was able to figure out debugging with the following process:

  1. clone the awx-operator repo
  2. create run.yml playbook per debugging doc
  3. create vars.yml file per debugging with vars mapped from your manifest file (ie. awx-demo.yaml) and no_log set to false
  4. run the playbook from the cloned awx-operator repo (ie. ansible-playbook run.yml -e @vars.yml -v)

NOTE: you will need your local kubeconfig and context setup to connect to your k8s cluster before running the playbook

# run.yml
---
- hosts: localhost
  roles:
    - installer
# vars.yml
---
ansible_operator_meta:
  name: awx-test
  namespace: awx-test
service_type: LoadBalancer
ingress_type: ingress
hostname: awx-test.jrbeilke.com
ingress_annotations: |
  kubernetes.io/ingress.class: alb
  alb.ingress.kubernetes.io/manage-backend-security-group-rules: true
  alb.ingress.kubernetes.io/security-groups: common-internal-us-east-1
no_log: 'false'

The playbook failed with the following error: failed: [localhost] (item=ingress) => {"ansible_loop_var": "item", "changed": false, "error": 400, "item": "ingress", "msg": "Ingress awx-test-ingress: Failed to apply object: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"Ingress in version \\\\\"v1\\\\\" cannot be handled as a Ingress: v1.Ingress.ObjectMeta: v1.ObjectMeta.Annotations: ReadString: expects \\\\\" or n, but found t, error found in #10 byte of ...|-rules\\\\\": true, \\\\\"alb.|..., bigger context ...|ernetes.io/manage-backend-security-group-rules\\\\\": true, \\\\\"alb.ingress.kubernetes.io/security-groups\\\\\": |...\",\"reason\":\"BadRequest\",\"code\":400}\\n'", "reason": "Bad Request", "status": 400}

In my case I had values in my manifest which were being interpolated as something besides a string (ie. booleans, integers) which was causing issues with the k8s ReadString function. By quoting values (ie. alb.ingress.kubernetes.io/manage-backend-security-group-rules: "true") I was able to resolve the issue deploying AWX to AWS with a load balancer

Would be really nice if there was a way to expose the failure message for things like ingress setup while still no_log'ing other things like database secrets and whatnot. Or maybe an option to override no_log via the manifest to spin up in a test cluster

jrbeilke avatar Jul 20 '22 21:07 jrbeilke