postgresql-container Redeployment unable to startup again

Updated the resource limits for a postgresql-persistent 9.5 deployment

pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "pg_log".
 done
server started
ERROR:  tuple already updated by self

It seems the first pod did not shutdown cleanly and left the PID in /var/lib/pgsql/data/userdata/postmaster.pid volume thus preventing the container from starting up automatically without manual intervention

Perhaps an edge case as this is the first time seeing this with many other postgresql deployments

Mar 16 '17 00:03 andrewklau

Hello, here facing the same problem: container unable to start, with same output detailed above. [Is there / does anyone have found] any solution or workaround for this problem? I quickly tried removing /var/lib/pgsql/data/userdata/postmaster.pid file but when starting the container I'm getting the same issue.

EDIT: I double checked, and in my case the output is:

 pg_ctl: another server might be running; trying to start server anyway
 waiting for server to start....LOG:  redirecting log output to logging collector process
 HINT:  Future log output will appear in directory "pg_log".
 ... done
 server started
 => sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
 ERROR:  tuple already updated by self

May 25 '18 08:05 pedro-dlfa

Thanks for the report. Interesting, @pedro-dlfa so you manually dropped the pid file, and immediately after that the container again refused to start (because of the pid file)? Smells like pg_ctl stop isn't really doing what we expect.

May 25 '18 09:05 praiskup

Hello, I am facing the same issue as @pedro-dlfa , I tried to delete the file manually and redeploy pod but with no success. My solution is to recreate pod again. I am using version 9.5 of postgresql.

Jul 30 '18 07:07 martin123218

If you are affected by this, can you confirm that deployment strategy is Recreate?

Oct 17 '18 09:10 praiskup

Hi, My strategy is Rolling and I was affected last week again.

Oct 19 '18 12:10 martin123218

Hi @martin123218

The problem with the Rolling strategy is that it tells Openshift to first create a new pod with the same data volume as the old one and only shut down the original pod when the new pod is up and running. Since there are at one time two pods accessing (and presumably writing to) the same data volume you can run into this issue.

Please use the Recreate strategy instead. There will be some downtime since the new pod is only started after the old pod gets shut down but you should not run into this issue anymore.

Oct 19 '18 13:10 pkubatrh

I've also just run into this issue. Is there a way to make this work with "Rolling" strategy to have zero downtime upgrades?

Nov 16 '18 09:11 bkabrda

Not with this trivial layout. This problem is equivalent to non-container scenario where you do dnf update postgresql-server. You have to shut down the old server, and start a new one. I.e. you can not let two servers write into the same data directory.

Btw., PostgreSQL server has a guard against "multiple servers writing to the same data directory" situation, but unfortunately in container scenario - it has deterministic PID number (PID=1). So concurrent PostgreSQL server (in different container) checks the pid/lock file, compares the PID with it's own PID and assumes "I'm the PID=1, so the PID file is some leftover from previous run". So it removes the PID file and continues with data directory modifications. This has a disaster potential.

Our templates only support Recreate strategy. The fact Rolling "mostly" works is matter of luck that the old server is not under heavy load.

That said, zero downtime problem needs to be solved on higher logical layer.

Nov 16 '18 10:11 praiskup

Ok, that makes sense, thanks. If I wanted to solve this on higher logical layer, how would I go about this? Do you have any good pointers?

Nov 16 '18 10:11 bkabrda

At this point, you'd have to start thinking about pgpool or similar thing (I'd prefer to have a separate issue for such RFEs, to not go off-topic in this bug report).

Mar 04 '19 14:03 praiskup

This issue seems to be caused by concurrent run of multiple postgresql containers against the same data directory (persistent VOLUME), e.g. caused by Rolling strategy in OpenShift.

I've heard an idea that it could also happen if OpenShift happens to be moving the container to idle state (because HA proxy decided so), while - during that time - some traffic makes the container to be woken up (ie new container is started even before it was successfully moved to idle state). Anyone able to confirm that this could happen?

Anyways, I'd like to have opinions how to handle this situation properly; how to protect against over-mounting the same storage - since detecting this reasonably from within container seems to be close to hard problem. The only way which comes to my mind is implementing a "best effort" guard by some daemon implementing "leader election" mechanism. Any links to how others do this?

We might delegate this to OpenShift operators, but I suspect that templates will have to stay supported anyways - or at least that postgresql-container should be also usable from (some)templates; and thus the problem won't disappear from non-operator use-cases, or plain "docker" and "podman" use-cases.

Mar 18 '19 16:03 praiskup

Hi, facing the same issue, using the Recreate strategy. Deleting the postmaster.pid also did not help, as I got the same error at the next pod startup. Any idea on how to fix or work around this?

Apr 09 '20 15:04 flo-ryan

Had this problem after there was an issue with the underlying node that caused it to terminate very ungracefully. A new pod got spun up (as it is supposed to) on a new node but the container got stuck in a crash-back loop with this exact error message. Surely there needs to be an automated way to get around this problem? Especially because only a single replica is supported, there's not a lot of wiggle room for high-availability if the container can't start

Jul 30 '20 08:07 ShaunDave

This is old issue, but just faced the same with Recreate strategy. Next article explains how to reanimate failing pod and it helped me. https://pathfinder-faq-ocio-pathfinder-prod.pathfinder.gov.bc.ca/DB/PostgresqlCrashLoopTupleError.html https://serverfault.com/questions/942743/postgres-crash-loop-caused-by-a-tuple-concurrently-updated-error

We use only one database pod, so I believe that maybe it will not solve high availability issue, but at least database will work with one pod. Maybe will be useful for somebody.

Sep 18 '20 20:09 drobus

We've also had an off-line discussion with Daniel Messer from RH who've hit this problem in his team as well. After changing the strategy to Recreate, it problem seems to disappear, but there was a good point to start testing the crash scenario in the CI tests (run the OpenShift template, then kill the pod or postgres deamon directly). This seems like a good addition to our test coverage.

Sep 07 '21 15:09 hhorak

@drobus We changed the DeploymentConfig -> Deployment, here: https://github.com/sclorg/postgresql-container/blob/master/examples/postgresql-persistent-template.json. And strategy is mentioned 'Recrete'. So please closing this issue.

In case it is not yet fixed, Feel free to re-open it again.

Feb 28 '24 09:02 phracek