awx-operator icon indicating copy to clipboard operation
awx-operator copied to clipboard

Defining a custom postgres image for the backup role to pull image from

Open rdancel001 opened this issue 2 years ago • 3 comments

ISSUE TYPE
  • Bug Report
SUMMARY

Trying to get a backup deploy to the k8 cluster and I noticed the deployment is trying to pull the postgres image from docker.io. Is there a way to make it look at a different registry to pull the image from?

ENVIRONMENT
  • AWX version: 19.5.0
  • Operator version: 0.15.0
  • Kubernetes version: 1.21.9 (Azure)
  • AWX install method: kubectl deploy onto Azure Kube Services
STEPS TO REPRODUCE

Deploying the resource via the README on the backup role README.md https://github.com/ansible/awx-operator/tree/devel/roles/backup

EXPECTED RESULTS

The backup pod deployed and takes a backup of the existing awx cluster

ACTUAL RESULTS

Fails to pull postgres image from docker.io as the aks cluster doesnt have access to the public internet.

ADDITIONAL INFORMATION

Events: Type Reason Age From Message


Normal Scheduled 99s default-scheduler Successfully assigned default/awxbackup-20220610-101500-db-management to Normal SuccessfulAttachVolume 70s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-40679be5-4eec-46b3-8957-f3da80d5bdc7" Normal Pulling 26s (x3 over 66s) kubelet Pulling image "postgres:12" Warning Failed 26s (x3 over 66s) kubelet Failed to pull image "postgres:12": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/postgres:12": failed to resolve reference "docker.io/library/postgres:12": failed to do request: Head "https://registry-1.docker.io/v2/library/postgres/manifests/12": EOF Warning Failed 26s (x3 over 66s) kubelet Error: ErrImagePull Normal BackOff 4s (x4 over 66s) kubelet Back-off pulling image "postgres:12" Warning Failed 4s (x4 over 66s) kubelet Error: ImagePullBackOff

We cannot open the aks cluster to the public internet to allow this to download an image. We have a working Artifactory instance that we use to proxy public repositories. We want to be able to point to that to pull the image.

AWX-OPERATOR LOGS

rdancel001 avatar Jun 10 '22 16:06 rdancel001

Hi @rdancel001,

You can override the default postgres image by deploying a custom manifestfile which overrides those values.

apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: my-awx
  namespace: awx
spec:
  postgres_image: <custom_postgres_image_location>
  postgres_image_version: <custom_postgres_image_version>

PaulVerhoeven1 avatar Jun 14 '22 07:06 PaulVerhoeven1

Sry if it was unclear, but this is for the backup role not the actual install: https://github.com/ansible/awx-operator/tree/devel/roles/backup

Actual install/deploy works fine and we have our instance running. Based on the guide in the backup role, I created a backup-awx.yml as such:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWXBackup
metadata:
  name: awxbackup-20220610-101500
  namespace: default
spec:
  deployment_name: awx
  postgres_image: <in-house-container-repo-url>/postgres
  postgres_image_version: '12' 

And from here its halted as its trying to pull from docker.io instead of our in-house repo.

rdancel001 avatar Jun 15 '22 12:06 rdancel001

Ah sorry i didn't read it good enough. This seems to be a new feature/option that has to be created for the backup role. I am also interested in this option because i also can't download images direct form the internet in my cluster.

PaulVerhoeven1 avatar Jun 24 '22 14:06 PaulVerhoeven1

Hello,

We are facing the same issue with AWX Operator installed in the isolated environment without Internet connection. We tried once the backup procedure and the created pod entered in the looping ImagePullBackOff state trying to pull postgres image from docker.io. We cannot even kill this pod because it starts again and again. How can we stop this failed backup process once for all ?

Thank you.

Shagrat2006 avatar May 23 '23 17:05 Shagrat2006

Same here, (still :disappointed: ) trying to setup a production environment with awx-operator, where backup/restore is necessary. We have an isolated environment as well, where we use artifactory to proxy the official container registries.

Please please please :heart: add a variable for the postgres image to use, or use the existing one (but take care of the mechanism where it is decided whether to use a managed postgres DB or an external, since we have an external and don't what to start it managed by awx-operator). But it seems, the decision is done via if postgres_configuration_secret.exists, what means, it should be no problem to specify both, postgres_configuration_secret and postgres_image + postgres_image_version (but as I am not a developer without guarantee).

When backup and restore works, awx-operator can be used in production within an isolated environment as well! :dancers:

Many thanks in advance

samweisgamdschie avatar Jun 12 '23 08:06 samweisgamdschie

Hello @samweisgamdschie,

We have managed it in AWX v22.2.0 (Operator v2.1.0) and the following spec configuration in the manifest :

spec:
  backup_pvc: awx-backup-claim
  clean_backup_on_delete: True
  deployment_name: awx
  postgres_image: <in-house-container-repo-url>/library/postgres
  postgres_image_version: '13'

There are the following tasks in the https://github.com/ansible/awx-operator/blob/2.1.0/roles/backup/tasks/init.yml (lines 69-78):

- name: Set user provided postgres image
  set_fact:
    _custom_postgres_image: "{{ postgres_image }}:{{ postgres_image_version }}"
  when:
    - postgres_image | default([]) | length
    - postgres_image_version is defined and postgres_image_version != ''
    
- name: Set Postgres image URL
  set_fact:
    _postgres_image: "{{ _custom_postgres_image | default(lookup('env', 'RELATED_IMAGE_AWX_POSTGRES')) | default(_default_postgres_image, true) }}"

So, when you specify the value of postgres_image and postgres_image_version in the spec of the manifest as shown in the example above, the variable _postgres_image will take the value of the custom URL <in-house-container-repo-url>/library/postgres:13

With us, the backup is working manually and also using AWX scheduled job. We did not test the restore yet.

Shagrat2006 avatar Jun 12 '23 15:06 Shagrat2006

Appreciate this update.. Will put this on our to-do list to get updated and test it out.

rdancel001 avatar Jun 13 '23 04:06 rdancel001

Thank you for commenting that here @Shagrat2006 !

I see that this has also been added for the restore role:

  • https://github.com/ansible/awx-operator/blob/devel/config/crd/bases/awx.ansible.com_awxrestores.yaml#L94

I believe we can close this issue now.

rooftopcellist avatar Dec 01 '23 22:12 rooftopcellist