awx-operator icon indicating copy to clipboard operation
awx-operator copied to clipboard

AWXBackup: mkdir: cannot create directory '/backups/tower-openshift-backup-2024-04-17-141257': Permission denied

Open PWeverton opened this issue 10 months ago • 10 comments

Please confirm the following

  • [X] I agree to follow this project's code of conduct.
  • [X] I have checked the current issues for duplicates.
  • [X] I understand that the AWX Operator is open source software provided for free and that I might not receive a timely response.

Bug Summary

Upgraded awx operator from 1.1.4 to 2.13.1 and started to get issues when trying to take backups. Here`s an example of the AWXBackup I have:

apiVersion: awx.ansible.com/v1beta1 kind: AWXBackup metadata: name: awx-demo namespace: awx-test spec: deployment_name: awx-demo backup_pvc: 'backup-pvc' no_log: false

Once applied, operator tries to create a folder for the backup on the db-management pod. However, its getting the issue permission denied

[backup : Set backup directory name] **************************************\r\ntask path: /opt/ansible/roles/backup/tasks/postgres.yml:55\nok: [localhost] => {"ansible_facts": {"backup_dir": "/backups/tower-openshift-backup-2024-04-17-141257"}, "changed": false}\n\r\nTASK [backup : Create directory for backup] ************************************\r\ntask path: /opt/ansible/roles/backup/tasks/postgres.yml:59\nansible.cfg.\nfatal: [localhost]: FAILED! => {"changed": true, "rc": 1, "return_code": 1, "stderr": "mkdir: cannot create directory '/backups/tower-openshift-backup-2024-04-17-141257': Permission denied\n", "stderr_lines": ["mkdir: cannot create directory '/backups/tower-openshift-backup-2024-04-17-141257': Permission denied"]

AWX Operator version

2.13.1

AWX version

24.0.0

Kubernetes platform

kubernetes

Kubernetes/Platform version

microk8s v1.28.8

Modifications

no

Steps to reproduce

Fresh installation and trying to create a backup using AWXBackup CR.

Expected results

Take the backup successfully

Actual results

Failed backup

Additional information

No response

Operator Logs

No response

PWeverton avatar Apr 17 '24 14:04 PWeverton

Hello @PWeverton, can you read through this issue and see if it applies to your case https://github.com/ansible/awx-operator/issues/1775? The new postgres image is expecting to write to your dir as uid-26. There are some workarounds discussed to address the change.

jessicamack avatar Apr 17 '24 17:04 jessicamack

Hello @jessicamack, thanks for replying. Well, the operator is having issues when trying to create the dir on db-management pod, so I don't think the issue you marked is related to it. However, I just reproduced the action items suggested there. Here's what I did:

  • Used operator 2.15.0, as that feature was added on this tag.
  • Then adjusted my AWX CR:

apiVersion: awx.ansible.com/v1beta1 kind: AWX metadata: name: awx-app spec: no_log: false service_type: nodeport postgres_data_volume_init: true postgres_init_container_commands: | chown 26:0 /var/lib/pgsql/data chmod 700 /var/lib/pgsql/data

Even after this change, the issue with the permissions still there.

PWeverton avatar Apr 17 '24 20:04 PWeverton

This issue is not addressed by #1805 (postgres_data_volume_init and postgres_init_container_commands) since this issue is in following situation:

  • Occurs in ephemeral *-db-management pod instead of main PSQL pod
  • Occurs in backup pvc instead of main PSQL pvc
  • No init container for *-db-management pod is implemented in the current AWX Operator
  • From 2.13.0, the image for *-db-management pod has also been changed to sclorg's one

So we should implement init container for *-db-management pod and have a flag to modify owners/perms for backup pvc, or have a flag to run *-db-management pod as UID:0.

@rooftopcellist F.Y.I.

kurokobo avatar Apr 18 '24 12:04 kurokobo

hi @kurokobo, any movement here? Thanks

PWeverton avatar Apr 25 '24 12:04 PWeverton

I just made PR #1854 , I'm able to take successful backups now if I run that init container once per PVC

ranvit avatar May 10 '24 22:05 ranvit

Please add this chang to the next Release since awxbackup also cannot create directory in my deployment because of permission issues. Changing the permissions on the NFS server to User ID 26 solved it but this is an manuall configuration step das workarround.

pombaer avatar May 13 '24 09:05 pombaer

May it helps someone, i workaround this problem by creating a cronjob which crates my backup and added an initcontainer which sets the permissions to 26:26 on the backup folder.

pombaer avatar May 14 '24 08:05 pombaer

I hit this issue after upgrading to 2.15.0. As per @pombaer first suggestion, I added another NFS mount and set the owner UID and GID to 26, then created a new PV/PVC and pushed the backup to that. For anyone using AWS EFS, you need to create an access point with the correct uid and gid and mount with that for it to work properly.

bar0n36 avatar May 15 '24 04:05 bar0n36

Did this issue get resolved? I have the same issue running 2.19.1

Interestingly if I change the Postgres in my awxbackup.yml to

  _postgres_image: docker.io/postgres
  _postgres_image_version: 15-alpine

The issue goes away for the "mkdir: cannot create directory '/backups': Permission denied" and I can take a successful backup. However this just shifts my issue to a restore problem of.

"pg_restore: error: unsupported version (1.15) in file header"

So I went back and modified the permissions to 26 on the /backups and it works, but my hack was dirty so wondering the correct way this will be done.

 # _postgres_image: docker.io/postgres
 # _postgres_image_version: 15-alpine

running k3s. Client Version: v1.29.4+k3s1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.4+k3s1

morley461 avatar Sep 06 '24 11:09 morley461

The PR still open and seems it will take a while to be merged. As a workaround, you can clone the repo and modify the way this is handled. You can use one of these 2 options:

roles/backup/templates/management-pod.yml.j2

1 - Add an init container

initContainers: - name: init-pvc-chown image: busybox # _postgres_image runs as uid 26 command: ["sh", "-c", "chown -R :26 /backups && chmod -R 770 /backups"] volumeMounts: - name: {{ ansible_operator_meta.name }}-backup mountPath: /backups readOnly: false

2 - Run the container with privileged user

containers:

  • name: {{ ansible_operator_meta.name }}-db-management image: "{{ _postgres_image }}" imagePullPolicy: "{{ image_pull_policy }}" command: ["sleep", "infinity"] securityContext: runAsUser: 0 privileged: true

Once you have it in place, just build the image and set the image url on your operator deployment.

PWeverton avatar Sep 06 '24 14:09 PWeverton