Redeployment unable to startup again
Updated the resource limits for a postgresql-persistent 9.5 deployment
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....LOG: redirecting log output to logging collector process
HINT: Future log output will appear in directory "pg_log".
done
server started
ERROR: tuple already updated by self
It seems the first pod did not shutdown cleanly and left the PID in /var/lib/pgsql/data/userdata/postmaster.pid volume thus preventing the container from starting up automatically without manual intervention
Perhaps an edge case as this is the first time seeing this with many other postgresql deployments
Hello, here facing the same problem: container unable to start, with same output detailed above. [Is there / does anyone have found] any solution or workaround for this problem? I quickly tried removing /var/lib/pgsql/data/userdata/postmaster.pid file but when starting the container I'm getting the same issue.
EDIT: I double checked, and in my case the output is:
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....LOG: redirecting log output to logging collector process
HINT: Future log output will appear in directory "pg_log".
... done
server started
=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
ERROR: tuple already updated by self
Thanks for the report. Interesting, @pedro-dlfa so you manually dropped the pid file, and immediately after that the container again refused to start (because of the pid file)? Smells like pg_ctl stop isn't really doing what we expect.
Hello, I am facing the same issue as @pedro-dlfa , I tried to delete the file manually and redeploy pod but with no success. My solution is to recreate pod again. I am using version 9.5 of postgresql.
If you are affected by this, can you confirm that deployment strategy is Recreate?
Hi, My strategy is Rolling and I was affected last week again.
Hi @martin123218
The problem with the Rolling strategy is that it tells Openshift to first create a new pod with the same data volume as the old one and only shut down the original pod when the new pod is up and running. Since there are at one time two pods accessing (and presumably writing to) the same data volume you can run into this issue.
Please use the Recreate strategy instead. There will be some downtime since the new pod is only started after the old pod gets shut down but you should not run into this issue anymore.
I've also just run into this issue. Is there a way to make this work with "Rolling" strategy to have zero downtime upgrades?
Not with this trivial layout. This problem is equivalent to non-container scenario where you do dnf update postgresql-server. You have to shut down the old server, and start a new one. I.e. you can not let two servers write into the same data directory.
Btw., PostgreSQL server has a guard against "multiple servers writing to the same data directory" situation, but unfortunately in container scenario - it has deterministic PID number (PID=1). So concurrent PostgreSQL server (in different container) checks the pid/lock file, compares the PID with it's own PID and assumes "I'm the PID=1, so the PID file is some leftover from previous run". So it removes the PID file and continues with data directory modifications. This has a disaster potential.
Our templates only support Recreate strategy. The fact Rolling "mostly" works is matter of luck that the old server is not under heavy load.
That said, zero downtime problem needs to be solved on higher logical layer.
Ok, that makes sense, thanks. If I wanted to solve this on higher logical layer, how would I go about this? Do you have any good pointers?
At this point, you'd have to start thinking about pgpool or similar thing (I'd prefer to have a separate issue for such RFEs, to not go off-topic in this bug report).
This issue seems to be caused by concurrent run of multiple postgresql containers against the same data directory (persistent VOLUME), e.g. caused by Rolling strategy in OpenShift.
I've heard an idea that it could also happen if OpenShift happens to be moving the container to idle state (because HA proxy decided so), while - during that time - some traffic makes the container to be woken up (ie new container is started even before it was successfully moved to idle state). Anyone able to confirm that this could happen?
Anyways, I'd like to have opinions how to handle this situation properly; how to protect against over-mounting the same storage - since detecting this reasonably from within container seems to be close to hard problem. The only way which comes to my mind is implementing a "best effort" guard by some daemon implementing "leader election" mechanism. Any links to how others do this?
We might delegate this to OpenShift operators, but I suspect that templates will have to stay supported anyways - or at least that postgresql-container should be also usable from (some)templates; and thus the problem won't disappear from non-operator use-cases, or plain "docker" and "podman" use-cases.
Hi, facing the same issue, using the Recreate strategy. Deleting the postmaster.pid also did not help, as I got the same error at the next pod startup. Any idea on how to fix or work around this?
Had this problem after there was an issue with the underlying node that caused it to terminate very ungracefully. A new pod got spun up (as it is supposed to) on a new node but the container got stuck in a crash-back loop with this exact error message. Surely there needs to be an automated way to get around this problem? Especially because only a single replica is supported, there's not a lot of wiggle room for high-availability if the container can't start
This is old issue, but just faced the same with Recreate strategy. Next article explains how to reanimate failing pod and it helped me. https://pathfinder-faq-ocio-pathfinder-prod.pathfinder.gov.bc.ca/DB/PostgresqlCrashLoopTupleError.html https://serverfault.com/questions/942743/postgres-crash-loop-caused-by-a-tuple-concurrently-updated-error
We use only one database pod, so I believe that maybe it will not solve high availability issue, but at least database will work with one pod. Maybe will be useful for somebody.
We've also had an off-line discussion with Daniel Messer from RH who've hit this problem in his team as well. After changing the strategy to Recreate, it problem seems to disappear, but there was a good point to start testing the crash scenario in the CI tests (run the OpenShift template, then kill the pod or postgres deamon directly). This seems like a good addition to our test coverage.
@drobus We changed the DeploymentConfig -> Deployment, here: https://github.com/sclorg/postgresql-container/blob/master/examples/postgresql-persistent-template.json. And strategy is mentioned 'Recrete'. So please closing this issue.
In case it is not yet fixed, Feel free to re-open it again.