postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

[question] pg_wal eat disk because inactive replication slot

Open tydra-wang opened this issue 3 years ago • 0 comments

Please, answer some short questions which should help us to understand your problem / question better?

  • Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.5.0
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s
  • Are you running Postgres Operator in production? no
  • Type of issue? question

The first time I found my postgresql unavailable for 100% used pvc in a pod, I just expand the pvc. however, it failed again a few days later.

  • archive_mod is set to be off in my postgresql.
  • three instances postgresql cluster

Finally I found out this may be caused by inactive replication slot. Using select * from pg_replication_slots in the master pod, I saw two inactive replication slot. I fixed it by recreating two replicas' pods and pvcs manually (kubectl delete pod and pvc) and it went back to normal then. The master cleaned wal after replication slots all being active.

I got a few questions about this problem:

  • what could cause inactive replication slots ?
  • how to avoid it? Is operator responsible to reconcile it when replication slot being inactive?
  • Can I fix it just by using the latest version?

Thanks!

related issue:

  • https://github.com/zalando/postgres-operator/issues/1743
  • https://github.com/zalando/postgres-operator/issues/1664

tydra-wang avatar Aug 25 '22 13:08 tydra-wang