timescaledb
timescaledb copied to clipboard
[Bug]: Postgres Db crashes after few days with workload | Kubernetes
What type of bug is this?
Crash
What subsystems and features are affected?
Data ingestion
What happened?
We have been running timescale for more than two week on kubernetes using timescale-single helm chart. But after few days it crashes on the system.
TimescaleDB version affected
2.6.1
PostgreSQL version used
13.7
What operating system did you use?
Ubuntu 20.04, K8s 1.20.15
What installation method did you use?
Docker
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
Warning Unhealthy 19s (x24209 over 10d) kubelet Readiness probe failed: /var/run/postgresql:5432 - rejecting connections
When i exec into the container i can't see the socket file in /var/run
How can we reproduce the bug?
- Install timescale with backup disabled using timescale helm chart. https://github.com/timescale/timescaledb-kubernetes/tree/master/charts/timescaledb-single
- Connected with Promscale.
- Run workload for few days through prometheus, pushing large records to prometheus continuously(say at a period of 30s )
@this-is-r-gaurav thanks for reaching out. We released TimescaleDB 2.7.0 a few weeks ago, which includes a number of bug fixes. Does the problem still occur with this version as well?
If the problem still can be reproduced, might it be possible for you to share your table structure and some sample data as well?
I might need to try with latest timescaledb. Let me try and get back to you
Hi @jnidzwetzki Its still occurring with latest chart as well, sorry for taking that long.
I have used this image timescale/timescaledb-ha:pg14.4-ts2.7.1-p0
Hello @this-is-r-gaurav,
Thank you very much for your reply. Might it be possible for you to share the used table structure, some sample data, and the logfile of PostgreSQL that contains the error?
I would also recommend upgrading to TimescaleDB 2.7.2, as version 2.7.1 contains a memory leak that might cause OOM situations on long-running connections (see #4507), which could be the reason for the failed readiness probe after a few days.
Dear Author,
This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days. Thank you!
Closing the issue, as i have no longer access to those clusters.