postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

databaseInitSQL: SQL not being applied, perhaps logging can be improved

Open jmartin127 opened this issue 2 years ago • 4 comments

Overview

When using the databaseInitSQL (as documented here) feature, it is not obvious to the user whether the SQL actually executed successfully. There can be errors that are sent to stdout, but not available in any form to the user. UPDATE: I have submitted a PR for review, which I believe addresses this issue sufficiently.

Environment

  • Platform: Kubernetes
  • Platform Version: 1.21.7 (via k3d)
  • PGO Image Tag: (e.g. ubi8-5.0.4-0)
  • Postgres Version (e.g. 13)
  • Storage: hostpath

Steps to Reproduce

  1. Create a new namespace: kubectl create namespace pgo
  2. Fork the postgres-operator-examples repo
  3. Install the PGO crd from postgres-operator-examples: helm install pgo -n pgo helm/install
  4. Update the helm/postrgres/values.yaml file in postgres-operator-examples to use instanceReplicas: 2 and the following databaseInitSQL:
databaseInitSQL:
  name: pgcmain-bootstrap-sql
  key: bootstrap.sql
  1. Create a new config map with the following contents in the pgo namespace:
apiVersion: v1
kind: ConfigMap
metadata:
  name: pgcmain-bootstrap-sql
data:
  bootstrap.sql: |
    CREATE TABLE public.COMPANY(
       ID INT PRIMARY KEY     NOT NULL,
       NAME           TEXT    NOT NULL,
       AGE            INT     NOT NULL,
       ADDRESS        CHAR(50),
       SALARY         REAL
    );

    CREATE TABLE public.DEPARTMENT(
       ID INT PRIMARY KEY      NOT NULL,
       DEPT           CHAR(50) NOT NULL,
       EMP_ID         INT      NOT NULL
    );
  1. Create a new cluster from postgres-operator-examples: helm install pgcmain -n pgo helm/postgres
  2. Describe the operator and see that it appears to have applied the Init SQL:
kubectl -n pgo describe postgresclusters.postgres-operator.crunchydata.com pgcmain
...
  Database Init SQL:       pgcmain-bootstrap-sql
  1. Connect to the instance and observe that no tables were created. Also, the logs of the PGO pod show a potential issue (see below).

EXPECTED

  1. The tables within the bootstrap.sql file to have been created.

ACTUAL

  1. No tables were created (see log below)

Logs

kubectl -n pgo logs pgo-68db564fb5-7rrs9 | grep "Init"
...
time="2022-02-09T21:34:53Z" level=debug msg="Could not find a pod with a writable database container." file="internal/controller/postgrescluster/postgres.go:717" func="postgrescluster.(*Reconciler).reconcileDatabaseInitSQL" name=pgcmain namespace=pgo reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.0.4-0
time="2022-02-09T21:34:53Z" level=debug msg="Could not find a pod with a writable database container." file="internal/controller/postgrescluster/postgres.go:717" func="postgrescluster.(*Reconciler).reconcileDatabaseInitSQL" name=pgcmain namespace=pgo reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.0.4-0
time="2022-02-09T21:34:56Z" level=debug msg="Could not find a pod with a writable database container." file="internal/controller/postgrescluster/postgres.go:717" func="postgrescluster.(*Reconciler).reconcileDatabaseInitSQL" name=pgcmain namespace=pgo reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.0.4-0
time="2022-02-09T21:34:56Z" level=debug msg="Could not find a pod with a writable database container." file="internal/controller/postgrescluster/postgres.go:717" func="postgrescluster.(*Reconciler).reconcileDatabaseInitSQL" name=pgcmain namespace=pgo reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.0.4-0

jmartin127 avatar Feb 09 '22 22:02 jmartin127

In looking over the code... I see that the "Could not find a pod with a writable database container." message is likely due to timing of when the role annotations get added to the pods.

I see the code checks for a role of master as an annotation. For example:

    postgres-operator.crunchydata.com/role: master

However, when the reconcileDatabaseInitSQL function first runs, this label does not exist at first. I noticed this by checking for the annotation as the pods were starting up.

During start-up:

% kubectl -n pgo get pods pgcmain-instance1-hstp-0 -o yaml | grep master
<no matches>

After a few minutes, the annotation does exist:

% kubectl -n pgo get pods pgcmain-instance1-hstp-0 -o yaml | grep master
    status: '{"conn_url":"postgres://pgcmain-instance1-hstp-0.pgcmain-pods:5432/postgres","api_url":"https://pgcmain-instance1-hstp-0.pgcmain-pods:8008/patroni","state":"running","role":"master","version":"2.1.1","xlog_location":83900632,"timeline":1}'
    postgres-operator.crunchydata.com/role: master

Once the annotation exists, I no longer see the log statement about "Could not find a pod with a writable database container".

However, I am still not seeing that the SQL had actually taken effect. Is there a success message or other log message I can check for that would indicate that the initSQL ran successfully? I see that the reconcileDatabaseInitSQL function does this at the end:

	if err == nil {
		status := cluster.Spec.DatabaseInitSQL.Name
		cluster.Status.DatabaseInitSQL = &status
	}

But given what I posted originally, I believe I am not getting any errors here.

If the SQL from the init script fails, where would I see that?

jmartin127 avatar Feb 10 '22 00:02 jmartin127

Looks like stderr is getting ignored here: https://github.com/CrunchyData/postgres-operator/blob/5286720acb1235cce6d2ed691b6a91de650b014e/internal/controller/postgrescluster/postgres.go#L742

Perhaps this code:

	// A writable pod executor has been found and we have the sql provided by
	// the user. Setup a write function to execute the sql using the podExecutor
	write := func(ctx context.Context, exec postgres.Executor) error {
		_, _, err := exec.Exec(ctx, strings.NewReader(data), map[string]string{})
		return err
	}

Should be modified to:

	write := func(ctx context.Context, exec postgres.Executor) error {
		stdout, stderr, err := exec.Exec(ctx, strings.NewReader(data), map[string]string{})
		log.V(1).Info("applied init SQL", "stdout", stdout, "stderr", stderr)
		return err
	}

jmartin127 avatar Feb 10 '22 04:02 jmartin127

I was able to get this working after adding \c pgcmain to my bootstrap script. Adding the additional logging may help to see issues with init more easily... and seems to be the standard in other places in the code.

jmartin127 avatar Feb 10 '22 05:02 jmartin127

Is there a maintainer who would be willing to review the associated PR here? Thank you in advance.

jmartin127 avatar Feb 28 '22 16:02 jmartin127

Merged and fixed, thx!

benjaminjb avatar Oct 18 '22 14:10 benjaminjb