awx-operator
awx-operator copied to clipboard
Managed postgres - /var/lib/postgresql/data: permission denied
I am trying to install AWX using the awx-operator running on k3s and awx-postgres pod fails with the message:
mkdir: cannot create directory ‘/var/lib/postgresql/data’: Permission denied
Here is my awx.yml
:
---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx
spec:
ingress_type: Ingress
route_tls_termination_mechanism: edge
hostname: localhost
postgres_storage_requirements:
requests:
storage: 3Gi
projects_persistence: true
projects_existing_claim: awx-projects-claim
web_resource_requirements:
requests:
cpu: 250m
memory: 2Gi
limits:
cpu: 750m
memory: 4Gi
task_resource_requirements:
requests:
cpu: 250m
memory: 1Gi
limits:
cpu: 500m
memory: 2Gi
ee_resource_requirements:
requests:
cpu: 250m
memory: 1Gi
limits:
cpu: 500m
memory: 2Gi
What am I doing wrong here?
Have a look at this. https://github.com/kurokobo/awx-on-k3s
I'm am following the exact instructions @marwel linked above but I'm getting the exact same error (mkdir: cannot create directory ‘/var/lib/postgresql/data’: Permission denied). I've been banging my head for 4 days w/ no success on installing AWX.
I'll give a try this afternoon using k3s
since I cannot reproduce on my current lab
Hello guys, I've deployed a k3s
with a single node on my testing machine as described at https://rancher.com/docs/k3s/latest/en/quick-start/#install-script
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
storm.tatu.home Ready control-plane,master 3m39s v1.21.3+k3s1
$ kubectl get pods -A 23:01:09
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-5ff76fc89d-4d7bn 1/1 Running 0 9m51s
kube-system metrics-server-86cbb8457f-9fkt2 1/1 Running 0 9m51s
kube-system coredns-7448499f4d-9t87w 1/1 Running 0 9m51s
kube-system helm-install-traefik-crd-mlrtg 0/1 Completed 0 9m51s
kube-system helm-install-traefik-v5n5s 0/1 Completed 1 9m51s
kube-system svclb-traefik-c9cgh 2/2 Running 0 9m28s
kube-system traefik-97b44b794-6dz4g 1/1 Running 0 9m28s
Then I generated the latest devel
operator image and deployed:
kubectl apply -f deploy/awx-operator.yaml 23:08:40
customresourcedefinition.apiextensions.k8s.io/awxs.awx.ansible.com created
customresourcedefinition.apiextensions.k8s.io/awxbackups.awx.ansible.com created
customresourcedefinition.apiextensions.k8s.io/awxrestores.awx.ansible.com created
clusterrole.rbac.authorization.k8s.io/awx-operator created
clusterrolebinding.rbac.authorization.k8s.io/awx-operator created
serviceaccount/awx-operator created
deployment.apps/awx-operator created
The operator started as expected:
kubectl get pods -A -w 23:07:32
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-5ff76fc89d-4d7bn 1/1 Running 0 11m
kube-system metrics-server-86cbb8457f-9fkt2 1/1 Running 0 11m
kube-system coredns-7448499f4d-9t87w 1/1 Running 0 11m
kube-system helm-install-traefik-crd-mlrtg 0/1 Completed 0 11m
kube-system helm-install-traefik-v5n5s 0/1 Completed 1 11m
kube-system svclb-traefik-c9cgh 2/2 Running 0 10m
kube-system traefik-97b44b794-6dz4g 1/1 Running 0 10m
default awx-operator-88b886454-9pq7w 0/1 ContainerCreating 0 15s
default awx-operator-88b886454-9pq7w 1/1 Running 0 16s
So now to troubleshooting, I'm using a similar AWX spec provided earlier on as follows below. As you can see, I have to extend it so I could create the PVC awx-projects-claim
expected to exist according to the AWX spec.
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: awx-projects-claim
namespace: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
resources:
requests:
storage: 2Gi
---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx
spec:
ingress_type: Ingress
route_tls_termination_mechanism: edge
hostname: localhost
postgres_storage_requirements:
requests:
storage: 3Gi
projects_persistence: true
projects_existing_claim: awx-projects-claim
web_resource_requirements:
requests:
cpu: 250m
memory: 2Gi
limits:
cpu: 750m
memory: 4Gi
task_resource_requirements:
requests:
cpu: 250m
memory: 1Gi
limits:
cpu: 500m
memory: 2Gi
ee_resource_requirements:
requests:
cpu: 250m
memory: 1Gi
limits:
cpu: 500m
memory: 2Gi
$ kubectl apply -f pg-k3s.yml 23:13:08
persistentvolumeclaim/awx-projects-claim created
awx.awx.ansible.com/awx created
# still pending because POD has not started yet
$ kubectl get pvc 23:14:05
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
awx-projects-claim Pending local-path 23s
postgres-awx-postgres-0 Bound pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c 3Gi RWO local-path 4s
Then looking at the pod, I got it to crash
$ sh kubectl get pods -A -w 23:07:32
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system local-path-provisioner-5ff76fc89d-4d7bn 1/1 Running 0 11m
kube-system metrics-server-86cbb8457f-9fkt2 1/1 Running 0 11m
kube-system coredns-7448499f4d-9t87w 1/1 Running 0 11m
kube-system helm-install-traefik-crd-mlrtg 0/1 Completed 0 11m
kube-system helm-install-traefik-v5n5s 0/1 Completed 1 11m
kube-system svclb-traefik-c9cgh 2/2 Running 0 10m
kube-system traefik-97b44b794-6dz4g 1/1 Running 0 10m
default awx-operator-88b886454-9pq7w 0/1 ContainerCreating 0 15s
default awx-operator-88b886454-9pq7w 1/1 Running 0 16s
default awx-postgres-0 0/1 Pending 0 0s
kube-system helper-pod-create-pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c 0/1 Pending 0 0s
kube-system helper-pod-create-pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c 0/1 ContainerCreating 0 0s
kube-system helper-pod-create-pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c 0/1 Completed 0 3s
kube-system helper-pod-create-pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c 0/1 Terminating 0 3s
kube-system helper-pod-create-pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c 0/1 Terminating 0 3s
default awx-postgres-0 0/1 Pending 0 4s
default awx-postgres-0 0/1 ContainerCreating 0 4s
default awx-76bdfc954c-jxvll 0/4 Pending 0 0s
kube-system helper-pod-create-pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5 0/1 Pending 0 0s
kube-system helper-pod-create-pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5 0/1 ContainerCreating 0 0s
default awx-postgres-0 1/1 Running 0 15s
kube-system helper-pod-create-pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5 0/1 Completed 0 6s
kube-system helper-pod-create-pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5 0/1 Terminating 0 7s
kube-system helper-pod-create-pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5 0/1 Terminating 0 7s
default awx-postgres-0 0/1 Error 0 16s
default awx-76bdfc954c-jxvll 0/4 Pending 0 7s
default awx-76bdfc954c-jxvll 0/4 Init:0/1 0 8s
default awx-postgres-0 0/1 Error 1 18s
default awx-postgres-0 0/1 CrashLoopBackOff 1 18s
default awx-76bdfc954c-jxvll 0/4 PodInitializing 0 18s
default awx-postgres-0 1/1 Running 2 35s
default awx-postgres-0 0/1 Error 2 35s
default awx-postgres-0 0/1 CrashLoopBackOff 2 48s
default awx-postgres-0 0/1 Error 3 64s
default awx-postgres-0 0/1 CrashLoopBackOff 3 77s
default awx-76bdfc954c-jxvll 4/4 Running 0 111s
default awx-postgres-0 0/1 CrashLoopBackOff 4 2m11s
So basically the postgres
statefulset did not work start however the awx
worked fine (of course not functional due to the missing database)
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
postgres-awx-postgres-0 Bound pvc-3b9e6563-9085-4d79-90ba-fa6c88431c6c 3Gi RWO local-path 2m37s
awx-projects-claim Bound pvc-85b1b705-43b3-42a6-a96b-1e79943e99d5 2Gi RWO local-path 2m56s
$ kubectl get pods 23:17:32
NAME READY STATUS RESTARTS AGE
awx-operator-88b886454-9pq7w 1/1 Running 0 8m47s
awx-76bdfc954c-jxvll 4/4 Running 0 3m21s
awx-postgres-0 0/1 Error 5 3m30s
Then looking at the container, yes, I got the same error on k3s
using the local-path-provisioner. It looks like similar to the https://github.com/ansible/awx-operator/pull/413, however, we need to address it for the postgresql
statefulset.
$ kubectl logs awx-postgres-0 23:26:52
mkdir: cannot create directory ‘/var/lib/postgresql/data’: Permission denied
I'm working on it.
So basically we will need to leverage on a `initContainer approach to fix the permission so the database can be created. This snippet will do the job:
diff --git a/roles/installer/tasks/database_configuration.yml b/roles/installer/tasks/database_configuration.yml
index 2e99be5..470530a 100644
--- a/roles/installer/tasks/database_configuration.yml
+++ b/roles/installer/tasks/database_configuration.yml
@@ -80,8 +80,9 @@
- block:
- name: Create Database if no database is specified
k8s:
- apply: true
+ apply: yes
definition: "{{ lookup('template', 'postgres.yaml.j2') }}"
+ wait: yes
register: create_statefulset_result
rescue:
diff --git a/roles/installer/templates/postgres.yaml.j2 b/roles/installer/templates/postgres.yaml.j2
index d17ee12..f87c842 100644
--- a/roles/installer/templates/postgres.yaml.j2
+++ b/roles/installer/templates/postgres.yaml.j2
@@ -37,10 +37,27 @@ spec:
imagePullSecrets:
- name: {{ image_pull_secret }}
{% endif %}
+ initContainers:
+ - name: init-chmod-data
+ image: '{{ postgres_image }}:{{ postgres_image_version }}'
+ imagePullPolicy: '{{ image_pull_policy }}'
+ command:
+ - /bin/sh
+ - -c
+ - |
+ if [ ! -f {{ postgres_data_path }}/PG_VERSION ]; then
+ chown postgres:root {{ postgres_data_path | dirname }}
+ fi
+ volumeMounts:
+ - name: postgres
+ mountPath: '{{ postgres_data_path | dirname }}'
+ subPath: '{{ postgres_data_path | dirname | basename }}'
containers:
- image: '{{ postgres_image }}:{{ postgres_image_version }}'
imagePullPolicy: '{{ image_pull_policy }}'
name: postgres
+ securityContext:
+ fsGroup: 999
env:
# For postgres_image based on rhel8/postgresql-12
- name: POSTGRESQL_DATABASE
It does result in a working state once the patch is applied:
$ ubectl get pods -w 00:38:58
NAME READY STATUS RESTARTS AGE
awx-operator-5bc776b4d4-d9ww2 1/1 Running 0 4m41s
awx-postgres-0 1/1 Running 0 4m3s
awx-d67898cd9-k6jrc 4/4 Running 0 3m48s
$ kubectl iexec awx-postgres-0 /bin/bash 00:57:00
root@awx-postgres-0:/# namei -xmolv /var/lib/postgresql/data/pgdata/
f: /var/lib/postgresql/data/pgdata/
Drwxr-xr-x root root /
drwxr-xr-x root root var
drwxr-xr-x root root lib
drwxr-xr-x postgres postgres postgresql
Drwx------ postgres root data
drwx------ postgres root pgdata
I'll create a PR for it. Thanks for reporting the issue @flisak-robert and @scott-vick
Is there any temporary solution before the update?
Is there any temporary solution before the update?
Don't know if it suits your needs, but I just ran postgres in a docker container and pointed awx to use that postgres instance instead. Here is my config:
apiVersion: v1
kind: Secret
metadata:
name: awx-postgres-configuration
namespace: awx
stringData:
host: <postgres address>
port: "5432"
database: awx
username: postgres
password: <postgres password>
type: unmanaged
type: Opaque
Don't forget to include postgres_configuration_secret: awx-postgres-configuration
in your awx config. If you don't, AWX won't be able to decrypt stuff in your postgres database when you restart your awx node for example. Been there, done that :(
A workaround that I have found is to create a PV:
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: <className>
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "<path>"
The path will be given 777 permission through chmod.
You need to add the attribute postgres_storage_class
with the same value as the storageClassName.
You need to add the attribute
postgres_storage_class
with the same value as the storageClassName.
let's say one would declare
storageClassName: pgdata
And place the declaration right underneath
postgres_storage_class: pgdata
It would be handy to do so because of maintainability of the next versions of awx-oprator
on top of that working with a PV on statefull pqdata would be a good idea anyways, right ? :)
Created a PV with storageClassName and used that SC name as above, but getting following error in Postgres pod log.
chmod: changing permissions of '/var/run/postgresql': Operation not permitted
Is there a way to fix this without giving 777 permissions to the pg data directory when the pod is running as a non-root user, in OpenShift clusters?
Funny, I see deltas in behaviour on the 0.13 vs the 0.12; DB working, but other pods not starting. Got this kinda working with a lot of tinkering on k3s with the help of Rancher 2.6, leveraging some volume swap magic. Have done and will do some more work on reproducing and working around issues.

did someone manage to solve this problem?
I eventually deployed with the dev branch of awx operator.
I solved this with mountOptions
.
Maybe, the postgres container requires options below on /PV/PVC/StorageClass .
mountOptions
- dir_mode=0750
- file_mode=0750
- uid=999
- gid=999
First, I tried only with uid=999
and gid=999
but container failed starting and out put this logs.
(It seems that postgres container is operated as 999:999 in this case)
fixing permissions on existing directory /var/lib/postgresql/data/pgdata ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
2022-05-11 03:03:31.207 UTC [83] FATAL: data directory "/var/lib/postgresql/data/pgdata" has invalid permissions
2022-05-11 03:03:31.207 UTC [83] DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
child process exited with exit code 1
But I succeeded by adding dir_mode=0750
and file_mode=0750
as pointed out in logs.
Just for the record, I used csi-smb-driver for PV/PVC/StorageClass on RKE2 single node.