postgres-operator
postgres-operator copied to clipboard
Any advice about running pgo in a service mesh like istio?
We currently run pgo ver 4.5.0. We are looking at using istio service mesh. Any advice or recommendations using pgo with service mesh? Do you know if anyone has successfully run pgo in a service mesh environment? Wondering how pgo might interplay with istio during primary -> replica failover for example.
@alrooney Have you found any possible way to istio service mesh?
@jkatz can you help me, some hints on how can we enable istio fo pgo?
@jkatz how to add an annotation to stanze-create
job to disable the sidecar injecter ?
@jkatz @alrooney I figured to disable the sidecar injection for istio but it gives me not able to connect to primary. Any hints ?
time="2021-05-01T05:03:56Z" level=info msg="pgo-backrest starts"
time="2021-05-01T05:03:56Z" level=info msg="debug flag set to false"
time="2021-05-01T05:03:56Z" level=info msg="backrest stanza-create command requested"
time="2021-05-01T05:03:56Z" level=info msg="command to execute is [pgbackrest stanza-create --db-host=10.244.9.208 --db-path=/pgdata/abcdtest]"
time="2021-05-01T05:03:56Z" level=info msg="command is pgbackrest stanza-create --db-host=10.244.9.208 --db-path=/pgdata/abcdtest "
time="2021-05-01T05:03:56Z" level=error msg="command terminated with exit code 56"
time="2021-05-01T05:03:56Z" level=info msg="output=[]"
time="2021-05-01T05:03:56Z" level=info msg="stderr=[WARN: unable to check pg-1: [UnknownError] remote-0 process on '10.244.9.208' terminated unexpectedly [255]: ssh_exchange_identification: Connection closed by remote host\nERROR: [056]: unable to find primary cluster - cannot proceed\n]"
time="2021-05-01T05:03:56Z" level=error msg="command terminated with exit code 56"
@gowrisankar22 we paused our istio rollout but planning to look at using pg as an external service because of issues with pgo and istio.
Is anyone successfully running PGO with Istio? The first issue I ran into was the backup pod erroring out:
% kubectl get pods -n pgo
NAME READY STATUS RESTARTS AGE
pgo-68db564fb5-5n2fb 2/2 Running 1 146m
pgcmain-repo-host-0 2/2 Running 0 146m
pgcmain-instance1-chg5-0 4/4 Running 0 146m
pgcmain-instance1-dgsj-0 4/4 Running 0 146m
pgcmain-backup-2sz6-rj7kn 1/2 Error 0 42m
kubectl -n pgo logs pgcmain-backup-2sz6-rj7kn
time="2022-02-03T23:43:12Z" level=info msg="crunchy-pgbackrest starts"
time="2022-02-03T23:43:12Z" level=info msg="debug flag set to false"
time="2022-02-03T23:43:12Z" level=fatal msg="Get \"https://10.43.0.1:443/api/v1/namespaces/pgo/pods?labelSelector=postgres-operator.crunchydata.com%2Fcluster%3Dpgcmain%2Cpostgres-operator.crunchydata.com%2Fpgbackrest%3D%2Cpostgres-operator.crunchydata.com%2Fpgbackrest-dedicated%3D\": dial tcp 10.43.0.1:443: connect: connection refused"
I believe this is due to the backup pod running an Istio sidecar, and the k8s API server not running Istio, so it cannot connect via TLS. Thinking of disabling the Istio sidecar for just the backup pod and seeing how that goes. Would be curious to hear how this has or has not worked for others.
Any update here? Sounds like apps running TLS themselves will clash with Istio.
Let me share what we did to run PGO inside the Istio service mesh enabling mTLS communication. We did the following things for it.
- Deploy Istio ServiceEnable for ports
5432
,2022
,8008
, and8432
to enable communication between pods deployed by PGO. We can see more details about Istio's traffic routing here. - Add the annotation
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
toPostgresCluster.spec.backups.pgbackrest.metadata.annotations
to avoid an error caused by starting the backup process before finishing the initialization process of the Istio sidecar proxy. Here is the Istio document describing the annotation. - Use a custom pgBackRest container image to call the Envoy proxy API to end the sidecar proxy after finishing the original pgBackRest execution. As discussed in this thread, we currently need to end the sidecar daemon explicitly to finish a Kubernetes job. We wrote a patch for PGO to configure it to change the entry point for the container image of the backup job to use the custom image. We'll make a PR to share the patch with the PGO project.
With latest istio, just add label to pgbackrest
metadata
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
spec:
backups:
pgbackrest:
metadata:
labels:
sidecar.istio.io/inject: 'false'
Hope it helps
With latest istio, just add label to
pgbackrest
metadataapiVersion: postgres-operator.crunchydata.com/v1beta1 kind: PostgresCluster spec: backups: pgbackrest: metadata: labels: sidecar.istio.io/inject: 'false'
Hope it helps
Now your postgres traffic is outside the service mesh, which just creates more problems. Ideally this code would actually work with Istio instead of requiring it to be disabled.
On our side, the issue comes from the port name postgres
, which is not the convention that istio is expecting (tcp-postgres
would work).
We found that by running:
> istioctl analyze -n postgres-operator
Info [IST0118] (Service postgres-operator/hippo-ha) Port name postgres (port: 5432, targetPort: postgres) doesn't follow the naming convention of Istio port.
Info [IST0118] (Service postgres-operator/hippo-primary) Port name postgres (port: 5432, targetPort: postgres) doesn't follow the naming convention of Istio port.
Info [IST0118] (Service postgres-operator/hippo-replicas) Port name postgres (port: 5432, targetPort: postgres) doesn't follow the naming convention of Istio port.
@tony-landreth, do you think we can we change the constant PortPostgreSQL = "postgres"
to PortPostgreSQL = "tcp-postgres"
without introducing a breaking change?