postgres-operator
postgres-operator copied to clipboard
databaseInitSQL: SQL not being applied, perhaps logging can be improved
Overview
When using the databaseInitSQL (as documented here) feature, it is not obvious to the user whether the SQL actually executed successfully. There can be errors that are sent to stdout, but not available in any form to the user. UPDATE: I have submitted a PR for review, which I believe addresses this issue sufficiently.
Environment
- Platform: Kubernetes
- Platform Version:
1.21.7
(via k3d) - PGO Image Tag: (e.g.
ubi8-5.0.4-0
) - Postgres Version (e.g.
13
) - Storage:
hostpath
Steps to Reproduce
- Create a new namespace:
kubectl create namespace pgo
- Fork the postgres-operator-examples repo
- Install the PGO crd from postgres-operator-examples:
helm install pgo -n pgo helm/install
- Update the helm/postrgres/values.yaml file in postgres-operator-examples to use
instanceReplicas: 2
and the following databaseInitSQL:
databaseInitSQL:
name: pgcmain-bootstrap-sql
key: bootstrap.sql
- Create a new config map with the following contents in the
pgo
namespace:
apiVersion: v1
kind: ConfigMap
metadata:
name: pgcmain-bootstrap-sql
data:
bootstrap.sql: |
CREATE TABLE public.COMPANY(
ID INT PRIMARY KEY NOT NULL,
NAME TEXT NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR(50),
SALARY REAL
);
CREATE TABLE public.DEPARTMENT(
ID INT PRIMARY KEY NOT NULL,
DEPT CHAR(50) NOT NULL,
EMP_ID INT NOT NULL
);
- Create a new cluster from postgres-operator-examples:
helm install pgcmain -n pgo helm/postgres
- Describe the operator and see that it appears to have applied the Init SQL:
kubectl -n pgo describe postgresclusters.postgres-operator.crunchydata.com pgcmain
...
Database Init SQL: pgcmain-bootstrap-sql
- Connect to the instance and observe that no tables were created. Also, the logs of the PGO pod show a potential issue (see below).
EXPECTED
- The tables within the bootstrap.sql file to have been created.
ACTUAL
- No tables were created (see log below)
Logs
kubectl -n pgo logs pgo-68db564fb5-7rrs9 | grep "Init"
...
time="2022-02-09T21:34:53Z" level=debug msg="Could not find a pod with a writable database container." file="internal/controller/postgrescluster/postgres.go:717" func="postgrescluster.(*Reconciler).reconcileDatabaseInitSQL" name=pgcmain namespace=pgo reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.0.4-0
time="2022-02-09T21:34:53Z" level=debug msg="Could not find a pod with a writable database container." file="internal/controller/postgrescluster/postgres.go:717" func="postgrescluster.(*Reconciler).reconcileDatabaseInitSQL" name=pgcmain namespace=pgo reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.0.4-0
time="2022-02-09T21:34:56Z" level=debug msg="Could not find a pod with a writable database container." file="internal/controller/postgrescluster/postgres.go:717" func="postgrescluster.(*Reconciler).reconcileDatabaseInitSQL" name=pgcmain namespace=pgo reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.0.4-0
time="2022-02-09T21:34:56Z" level=debug msg="Could not find a pod with a writable database container." file="internal/controller/postgrescluster/postgres.go:717" func="postgrescluster.(*Reconciler).reconcileDatabaseInitSQL" name=pgcmain namespace=pgo reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.0.4-0
In looking over the code... I see that the "Could not find a pod with a writable database container." message is likely due to timing of when the role annotations get added to the pods.
I see the code checks for a role of master
as an annotation. For example:
postgres-operator.crunchydata.com/role: master
However, when the reconcileDatabaseInitSQL function first runs, this label does not exist at first. I noticed this by checking for the annotation as the pods were starting up.
During start-up:
% kubectl -n pgo get pods pgcmain-instance1-hstp-0 -o yaml | grep master
<no matches>
After a few minutes, the annotation does exist:
% kubectl -n pgo get pods pgcmain-instance1-hstp-0 -o yaml | grep master
status: '{"conn_url":"postgres://pgcmain-instance1-hstp-0.pgcmain-pods:5432/postgres","api_url":"https://pgcmain-instance1-hstp-0.pgcmain-pods:8008/patroni","state":"running","role":"master","version":"2.1.1","xlog_location":83900632,"timeline":1}'
postgres-operator.crunchydata.com/role: master
Once the annotation exists, I no longer see the log statement about "Could not find a pod with a writable database container".
However, I am still not seeing that the SQL had actually taken effect. Is there a success message or other log message I can check for that would indicate that the initSQL ran successfully? I see that the reconcileDatabaseInitSQL
function does this at the end:
if err == nil {
status := cluster.Spec.DatabaseInitSQL.Name
cluster.Status.DatabaseInitSQL = &status
}
But given what I posted originally, I believe I am not getting any errors here.
If the SQL from the init script fails, where would I see that?
Looks like stderr is getting ignored here: https://github.com/CrunchyData/postgres-operator/blob/5286720acb1235cce6d2ed691b6a91de650b014e/internal/controller/postgrescluster/postgres.go#L742
Perhaps this code:
// A writable pod executor has been found and we have the sql provided by
// the user. Setup a write function to execute the sql using the podExecutor
write := func(ctx context.Context, exec postgres.Executor) error {
_, _, err := exec.Exec(ctx, strings.NewReader(data), map[string]string{})
return err
}
Should be modified to:
write := func(ctx context.Context, exec postgres.Executor) error {
stdout, stderr, err := exec.Exec(ctx, strings.NewReader(data), map[string]string{})
log.V(1).Info("applied init SQL", "stdout", stdout, "stderr", stderr)
return err
}
I was able to get this working after adding \c pgcmain
to my bootstrap script. Adding the additional logging may help to see issues with init more easily... and seems to be the standard in other places in the code.
Is there a maintainer who would be willing to review the associated PR here? Thank you in advance.
Merged and fixed, thx!