timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

[Bug]: Postgres Db crashes after few days with workload | Kubernetes

Open this-is-r-gaurav opened this issue 3 years ago • 4 comments

What type of bug is this?

Crash

What subsystems and features are affected?

Data ingestion

What happened?

We have been running timescale for more than two week on kubernetes using timescale-single helm chart. But after few days it crashes on the system.

TimescaleDB version affected

2.6.1

PostgreSQL version used

13.7

What operating system did you use?

Ubuntu 20.04, K8s 1.20.15

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

Warning  Unhealthy  19s (x24209 over 10d)  kubelet  Readiness probe failed: /var/run/postgresql:5432 - rejecting connections

When i exec into the container i can't see the socket file in /var/run

How can we reproduce the bug?

- Install timescale with backup disabled using timescale helm chart. https://github.com/timescale/timescaledb-kubernetes/tree/master/charts/timescaledb-single
- Connected with Promscale.
- Run workload for few days through prometheus, pushing large records to prometheus continuously(say at a period of 30s )

this-is-r-gaurav avatar Jul 07 '22 04:07 this-is-r-gaurav

@this-is-r-gaurav thanks for reaching out. We released TimescaleDB 2.7.0 a few weeks ago, which includes a number of bug fixes. Does the problem still occur with this version as well?

If the problem still can be reproduced, might it be possible for you to share your table structure and some sample data as well?

jnidzwetzki avatar Jul 07 '22 10:07 jnidzwetzki

I might need to try with latest timescaledb. Let me try and get back to you

this-is-r-gaurav avatar Jul 07 '22 11:07 this-is-r-gaurav

Hi @jnidzwetzki Its still occurring with latest chart as well, sorry for taking that long.

I have used this image timescale/timescaledb-ha:pg14.4-ts2.7.1-p0

this-is-r-gaurav avatar Jul 27 '22 11:07 this-is-r-gaurav

Hello @this-is-r-gaurav,

Thank you very much for your reply. Might it be possible for you to share the used table structure, some sample data, and the logfile of PostgreSQL that contains the error?

I would also recommend upgrading to TimescaleDB 2.7.2, as version 2.7.1 contains a memory leak that might cause OOM situations on long-running connections (see #4507), which could be the reason for the failed readiness probe after a few days.

jnidzwetzki avatar Jul 28 '22 08:07 jnidzwetzki

Dear Author,

This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days. Thank you!

github-actions[bot] avatar Sep 27 '22 02:09 github-actions[bot]

Closing the issue, as i have no longer access to those clusters.

this-is-r-gaurav avatar Oct 09 '22 12:10 this-is-r-gaurav