operator icon indicating copy to clipboard operation
operator copied to clipboard

Operator v0.75.0 deploys broken TektonResults database "cannot create directory ‘/bitnami/postgresql/data’: Permission denied"

Open Sir-Jacques opened this issue 8 months ago • 19 comments

Expected Behavior

Upgrading from v0.74.0 to v0.75.0 should work when using default configuration.

Actual Behavior

A new feature "TektonResult" is automatically deployed by the v0.75.0 operator. It tries to deploy a postgres db, which does not become healthy due to filesystem permissions. The postgres pod logs this:

postgresql 09:11:36.57                                                                                                                                                                                      
postgresql 09:11:36.58 Welcome to the Bitnami postgresql container                                                                                                                                          
postgresql 09:11:36.58 Subscribe to project updates by watching https://github.com/bitnami/containers                                                                                                       
postgresql 09:11:36.58 Submit issues and feature requests at https://github.com/bitnami/containers/issues                                                                                                   
postgresql 09:11:36.58                                                                                                                                                                                      
postgresql 09:11:36.59 INFO  ==> ** Starting PostgreSQL setup **                                                                                                                                            
postgresql 09:11:36.61 INFO  ==> Validating settings in POSTGRESQL_* env vars..                                                                                                                             
postgresql 09:11:36.61 INFO  ==> Loading custom pre-init scripts...                                                                                                                                         
postgresql 09:11:36.62 INFO  ==> Initializing PostgreSQL database...                                                                                                                                        
mkdir: cannot create directory ‘/bitnami/postgresql/data’: Permission denied

Steps to Reproduce the Problem

  1. Deploy v0.75.0 helm chart using the values.yaml below (using AWS EKS v1.31.6-eks-bc803b4)

Additional Info

Kubernetes version: Kubernetes: Server Version: version.Info{Major:"1", Minor:"31", GitVersion:"v1.31.6-eks-bc803b4", GitCommit:"7555883c9fd5b1ff4a68ad9feb15f9727bfa4b4a", GitTreeState:"clean", BuildDate:"2025-02-17T20:40:26Z", GoVersion:"go1.22.12", Compiler:"gc", Platform:"linux/amd64"} Tekton operator version: v0.75.0 Tekton pipeline version: v0.68.0

helm values.yaml:

installCRDs: true
tolerations:
  - key: node.kubernetes.io/infra
    operator: Exists
nodeSelector:
  type: infra

PV storageclass:

allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
  - key: topology.ebs.csi.aws.com/zone
    values:
    - eu-west-1a
    - eu-west-1b
    - eu-west-1c
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: gp3
parameters:
  csi.storage.k8s.io/fstype: xfs
  encrypted: "true"
  type: gp3
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Sir-Jacques avatar Apr 02 '25 09:04 Sir-Jacques

@pratap0007 Would you have a look please

jkhelil avatar Apr 07 '25 06:04 jkhelil

I hit this on a brand new install, which was perplexing. An not via helm, but via the normal k8s manifests.

I'm also unable to downgrade, as it now gives

Warning  UpdateFailed     111s (x31 over 25m)  tektonconfig-controller  Failed to update status for "config": admission webhook "webhook.operator.tekton.dev" denied the request: mutation failed: cannot decode incoming old object: json: unknown field "result

That's probably a different issue, but it seems that the bad release may not install properly, and thus perhaps you can't downgrade because the operator gets into a funky state.

btrepp avatar Apr 07 '25 09:04 btrepp

Thanks @btrepp @Sir-Jacques — @pratap0007 is currently working on it and will share a fix shortly

jkhelil avatar Apr 08 '25 10:04 jkhelil

Thanks @btrepp @Sir-Jacques — @pratap0007 is currently working on it and will share a fix shortly

I did manage to get 0.74 working after a few rounds of trying to nuke everything so I at least have a working instance now :)

74 didn't seem to include postgres at all, or my install is in a very odd state :)

btrepp avatar Apr 08 '25 12:04 btrepp

Hi @Sir-Jacques , I was trying to reproduce it on a kind cluster. I installed version v0.75.0 and also upgraded from v0.74.0 to v0.75.0, but I wasn’t able to reproduce the issue. On which cluster are you encountering this problem?

pratap0007 avatar Apr 14 '25 05:04 pratap0007

This issue might be happening in k8s version. I think we can recreate this in GCP..

khrm avatar Apr 14 '25 05:04 khrm

We need to update manifest for DB. Let me share a one to try.

khrm avatar Apr 14 '25 05:04 khrm

We're running EKS v1.31.6-eks-bc803b4. Happy to test an updated DB manifest, let me know if I can help

Sir-Jacques avatar Apr 14 '25 09:04 Sir-Jacques

I have the same Problem using 0.75.0 and longhorn. The bitnami/postgresql helm chart allows to add an init container to fix the permissions. Because the postgresql runs with dopped privileges

kilimnik avatar May 15 '25 09:05 kilimnik

I proposed a fix for this issue. Instructions to apply it manually are present in the pr template.

sventenraa avatar Jul 14 '25 22:07 sventenraa

I have the same issue using 0.76 on GKE. How should I fix it temporarily? I can't try to use it if the installation failed. Thx.

ryan-alexander-zhang avatar Jul 23 '25 01:07 ryan-alexander-zhang

Hi All

I had same issue in AKS and i have installed latest version - v0.76.0 ? does any one fixed this issue ?

tppalani avatar Aug 08 '25 07:08 tppalani

Sorry for the late reaction. We fixed this by adding mutating admission logic targetting the postgresql pod injected by the tekton operator to apply the fsgroup fixed mentioned in my pr. Since the PR is now merged I believe the next release will also ship this fix

Otherwise you could disable the setup of results via tektonconfig and deploy the v0.15.3 release manually

sventenraa avatar Aug 15 '25 08:08 sventenraa

@pratap0007 @khrm can you propose a fix for this please ?

jkhelil avatar Aug 18 '25 08:08 jkhelil

@jkhelil Fix is already merge. We also updated the release version of results with fix.

khrm avatar Aug 18 '25 08:08 khrm

thanks @khrm We should release operator soon this week, we will get the fix with latest version

jkhelil avatar Aug 18 '25 08:08 jkhelil

Can confirm that 0.77 release no longer requires mutation workaround to get postgresql to startup initially. I think this issue can now be closed.

sventenraa avatar Aug 22 '25 13:08 sventenraa

Still receiving this on 0.77:

  $   kubectl logs tekton-results-postgres-0
postgresql 11:45:47.72 INFO  ==>
postgresql 11:45:47.72 INFO  ==> Welcome to the Bitnami postgresql container
postgresql 11:45:47.73 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql 11:45:47.73 INFO  ==> Did you know there are enterprise versions of the Bitnami catalog? For enhanced secure software supply chain features, unlimited pulls from Docker, LTS support, or application customization, see Bitnami Premium or Tanzu Application Catalog. See https://www.arrow.com/globalecs/na/vendors/bitnami/ for more information.
postgresql 11:45:47.73 INFO  ==>
postgresql 11:45:47.76 INFO  ==> ** Starting PostgreSQL setup **
postgresql 11:45:47.79 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 11:45:47.80 INFO  ==> Loading custom pre-init scripts...
postgresql 11:45:47.81 INFO  ==> Initializing PostgreSQL database...
mkdir: cannot create directory ‘/bitnami/postgresql/data’: Permission denied
  $   kubectl get pod -o json -n tekton-operator tekton-operator-6cb8b8cdc4-8w6pn |jq -r '.spec.containers[].image'
ghcr.io/tektoncd/operator/operator-303303c315a48490ba6517859ef65b77:v0.77.0@sha256:821eb72cdcdf31c9413b71f853cebe63fe70fd1bb8640fe2db498a8e676970fa
ghcr.io/tektoncd/operator/operator-303303c315a48490ba6517859ef65b77:v0.77.0@sha256:821eb72cdcdf31c9413b71f853cebe63fe70fd1bb8640fe2db498a8e676970fa

PeterGrace avatar Sep 03 '25 11:09 PeterGrace

@pratap0007 can you have a look please ?

jkhelil avatar Sep 07 '25 14:09 jkhelil