postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Any advice about running pgo in a service mesh like istio?

Open alrooney opened this issue 3 years ago • 10 comments

We currently run pgo ver 4.5.0. We are looking at using istio service mesh. Any advice or recommendations using pgo with service mesh? Do you know if anyone has successfully run pgo in a service mesh environment? Wondering how pgo might interplay with istio during primary -> replica failover for example.

alrooney avatar Mar 19 '21 01:03 alrooney

@alrooney Have you found any possible way to istio service mesh?

@jkatz can you help me, some hints on how can we enable istio fo pgo?

gowrisankar22 avatar Apr 26 '21 02:04 gowrisankar22

@jkatz how to add an annotation to stanze-create job to disable the sidecar injecter ?

gowrisankar22 avatar Apr 26 '21 03:04 gowrisankar22

@jkatz @alrooney I figured to disable the sidecar injection for istio but it gives me not able to connect to primary. Any hints ?

time="2021-05-01T05:03:56Z" level=info msg="pgo-backrest starts"
time="2021-05-01T05:03:56Z" level=info msg="debug flag set to false"
time="2021-05-01T05:03:56Z" level=info msg="backrest stanza-create command requested"
time="2021-05-01T05:03:56Z" level=info msg="command to execute is [pgbackrest stanza-create  --db-host=10.244.9.208 --db-path=/pgdata/abcdtest]"
time="2021-05-01T05:03:56Z" level=info msg="command is pgbackrest stanza-create  --db-host=10.244.9.208 --db-path=/pgdata/abcdtest "
time="2021-05-01T05:03:56Z" level=error msg="command terminated with exit code 56"
time="2021-05-01T05:03:56Z" level=info msg="output=[]"
time="2021-05-01T05:03:56Z" level=info msg="stderr=[WARN: unable to check pg-1: [UnknownError] remote-0 process on '10.244.9.208' terminated unexpectedly [255]: ssh_exchange_identification: Connection closed by remote host\nERROR: [056]: unable to find primary cluster - cannot proceed\n]"
time="2021-05-01T05:03:56Z" level=error msg="command terminated with exit code 56"

gowrisankar22 avatar May 01 '21 05:05 gowrisankar22

@gowrisankar22 we paused our istio rollout but planning to look at using pg as an external service because of issues with pgo and istio.

alrooney avatar May 01 '21 12:05 alrooney

Is anyone successfully running PGO with Istio? The first issue I ran into was the backup pod erroring out:

% kubectl get pods -n pgo
NAME                        READY   STATUS    RESTARTS   AGE
pgo-68db564fb5-5n2fb        2/2     Running   1          146m
pgcmain-repo-host-0         2/2     Running   0          146m
pgcmain-instance1-chg5-0    4/4     Running   0          146m
pgcmain-instance1-dgsj-0    4/4     Running   0          146m
pgcmain-backup-2sz6-rj7kn   1/2     Error     0          42m
kubectl -n pgo logs pgcmain-backup-2sz6-rj7kn
time="2022-02-03T23:43:12Z" level=info msg="crunchy-pgbackrest starts"
time="2022-02-03T23:43:12Z" level=info msg="debug flag set to false"
time="2022-02-03T23:43:12Z" level=fatal msg="Get \"https://10.43.0.1:443/api/v1/namespaces/pgo/pods?labelSelector=postgres-operator.crunchydata.com%2Fcluster%3Dpgcmain%2Cpostgres-operator.crunchydata.com%2Fpgbackrest%3D%2Cpostgres-operator.crunchydata.com%2Fpgbackrest-dedicated%3D\": dial tcp 10.43.0.1:443: connect: connection refused"

I believe this is due to the backup pod running an Istio sidecar, and the k8s API server not running Istio, so it cannot connect via TLS. Thinking of disabling the Istio sidecar for just the backup pod and seeing how that goes. Would be curious to hear how this has or has not worked for others.

jmartin127 avatar Feb 04 '22 00:02 jmartin127

Any update here? Sounds like apps running TLS themselves will clash with Istio.

howels avatar Sep 01 '22 16:09 howels

Let me share what we did to run PGO inside the Istio service mesh enabling mTLS communication. We did the following things for it.

  • Deploy Istio ServiceEnable for ports 5432, 2022, 8008, and 8432 to enable communication between pods deployed by PGO. We can see more details about Istio's traffic routing here.
  • Add the annotation proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }' to PostgresCluster.spec.backups.pgbackrest.metadata.annotations to avoid an error caused by starting the backup process before finishing the initialization process of the Istio sidecar proxy. Here is the Istio document describing the annotation.
  • Use a custom pgBackRest container image to call the Envoy proxy API to end the sidecar proxy after finishing the original pgBackRest execution. As discussed in this thread, we currently need to end the sidecar daemon explicitly to finish a Kubernetes job. We wrote a patch for PGO to configure it to change the entry point for the container image of the backup job to use the custom image. We'll make a PR to share the patch with the PGO project.

polikeiji avatar Dec 07 '22 04:12 polikeiji

With latest istio, just add label to pgbackrest metadata

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
spec:
  backups:
    pgbackrest:
      metadata:
        labels:
          sidecar.istio.io/inject: 'false'

Hope it helps

nctam avatar Jul 19 '23 17:07 nctam

With latest istio, just add label to pgbackrest metadata

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
spec:
  backups:
    pgbackrest:
      metadata:
        labels:
          sidecar.istio.io/inject: 'false'

Hope it helps

Now your postgres traffic is outside the service mesh, which just creates more problems. Ideally this code would actually work with Istio instead of requiring it to be disabled.

howels avatar Jul 19 '23 18:07 howels

On our side, the issue comes from the port name postgres, which is not the convention that istio is expecting (tcp-postgres would work).

We found that by running:

> istioctl analyze -n postgres-operator

Info [IST0118] (Service postgres-operator/hippo-ha) Port name postgres (port: 5432, targetPort: postgres) doesn't follow the naming convention of Istio port.
Info [IST0118] (Service postgres-operator/hippo-primary) Port name postgres (port: 5432, targetPort: postgres) doesn't follow the naming convention of Istio port.
Info [IST0118] (Service postgres-operator/hippo-replicas) Port name postgres (port: 5432, targetPort: postgres) doesn't follow the naming convention of Istio port.

@tony-landreth, do you think we can we change the constant PortPostgreSQL = "postgres" to PortPostgreSQL = "tcp-postgres" without introducing a breaking change?

dulacp avatar May 13 '24 00:05 dulacp