helm-charts ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest

We have following environment

Three-nodes cluster deployed using chart timescaledb-single - 0.10.0 in test environment
Kubernetes 1.22.3
Storage is configured using local persistent volume.
Backup is not configuered, i set backup=false in chart

Today i want to clean disk so i scale down timescaledb pods and clean the disk from all three nodes and then scale up pods, but i am getting following error , am i missing something? is there any way to start from blank disk again?

# kubectl -n dev logs -f timescaledb-0 -c timescaledb
2021-12-04 21:55:53,775 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
2021-12-04 21:55:53,775 ERROR: failed to bootstrap (without leader)
2021-12-04 21:56:04,206 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
2021-12-04 21:56:04,207 ERROR: failed to bootstrap (without leader)
2021-12-04 21:56:14,207 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
2021-12-04 21:56:14,207 ERROR: failed to bootstrap (without leader)
2021-12-04 21:56:24,207 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
2021-12-04 21:56:24,207 ERROR: failed to bootstrap (without leader)
2021-12-04 21:56:34,207 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1

I logged into timescaledb pod and check patroni status, this is first instance and why its role is replica rather than master?

$ patronictl list
+ Cluster: yq (uninitialized) +---------+---------+----+-----------+
| Member        | Host        | Role    | State   | TL | Lag in MB |
+---------------+-------------+---------+---------+----+-----------+
| timescaledb-0 | 10.244.0.78 | Replica | stopped |    |   unknown |
+---------------+-------------+---------+---------+----+-----------+

Dec 05 '21 00:12 imranrazakhan

@imranrazakhan Delete k8s services from this chart (load balancer, nodeip related ones depending on your config) from the previous helm deployment should resolve this issue, as it did for me.

Dec 07 '21 12:12 davidandreoletti

@davidandreoletti Thanks for updates i will check this, Can we have more insight why we have to delete services? is it related to endpoint? i check ep yaml file but couldn't find any hint which stopping us to do clean start?

Dec 07 '21 12:12 imranrazakhan

Having the same issue. Deleting the resources from the previous helm deployment did not solve the issue for me.

Jan 24 '22 22:01 jholm117

Same issue - and confirmed no resources left in cluster from previous install.

Jan 27 '22 16:01 bleggett

Having the same issue. Deleting the resources from the previous helm deployment did not solve the issue for me.

I was able to get this working eventually, it's possible I missed cleaning up an endpoint or something.

Jan 28 '22 14:01 jholm117

@jholm117 @davidandreoletti we can fix issue by just deleting one ep (EndPoint) with name like clustername-config, where clustername is name provided during helm installation.

Mar 24 '22 10:03 imranrazakhan

I am still seeing this issue after using different release name and deleting older endpoints. It just stops suddenly after sometime. Any different solutions would be greatly appreciated. Thanks.

Aug 26 '22 14:08 veereshhalagegowda

Same here. It is happening in the latest release 0.27.4 It resolves automatically after a few minutes

Jan 01 '23 18:01 jleni

Same issue here. Happens on latest 0.27.5 as well. Would be good to see this fixed finally

Jan 23 '23 14:01 jprecuch

Same issue here, moving the deployment to a new namespace solved it temporarilly for me

Jan 24 '23 07:01 jfaldanam

Removing endpoints from a previous helm deployment solved it for me.

May 24 '23 16:05 JohnTzoumas

@JohnTzoumas thanks a lot! I have the same issue with the latest version.

I test disaster recovery right now and killed all PVCs + PODs. The startup of the new timescale pod stops at:

timescaledb 2023-05-25 11:53:56,422 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1

When I delete the 4 endpoints the recovery runs through.

May 25 '23 12:05 ayeks

I have the same issue. But in my case I have disabled the persistent storage, because in our dev environment we would like to clean the db by just restarting the container. I have also tried to set this to false: patroni.postgresql.pgbackrest.keep_data = false but no effect.

Dec 21 '23 08:12 MSandro

helm-charts helm-charts copied to clipboard

ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1

helm-charts
helm-charts copied to clipboard