helm-charts
helm-charts copied to clipboard
ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
We have following environment
- Three-nodes cluster deployed using chart timescaledb-single - 0.10.0 in test environment
- Kubernetes 1.22.3
- Storage is configured using local persistent volume.
- Backup is not configuered, i set backup=false in chart
Today i want to clean disk so i scale down timescaledb pods and clean the disk from all three nodes and then scale up pods, but i am getting following error , am i missing something? is there any way to start from blank disk again?
# kubectl -n dev logs -f timescaledb-0 -c timescaledb
2021-12-04 21:55:53,775 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
2021-12-04 21:55:53,775 ERROR: failed to bootstrap (without leader)
2021-12-04 21:56:04,206 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
2021-12-04 21:56:04,207 ERROR: failed to bootstrap (without leader)
2021-12-04 21:56:14,207 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
2021-12-04 21:56:14,207 ERROR: failed to bootstrap (without leader)
2021-12-04 21:56:24,207 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
2021-12-04 21:56:24,207 ERROR: failed to bootstrap (without leader)
2021-12-04 21:56:34,207 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
I logged into timescaledb pod and check patroni status, this is first instance and why its role is replica rather than master?
$ patronictl list
+ Cluster: yq (uninitialized) +---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------------+-------------+---------+---------+----+-----------+
| timescaledb-0 | 10.244.0.78 | Replica | stopped | | unknown |
+---------------+-------------+---------+---------+----+-----------+
@imranrazakhan Delete k8s services from this chart (load balancer, nodeip related ones depending on your config) from the previous helm deployment should resolve this issue, as it did for me.
@davidandreoletti Thanks for updates i will check this, Can we have more insight why we have to delete services? is it related to endpoint? i check ep yaml file but couldn't find any hint which stopping us to do clean start?
Having the same issue. Deleting the resources from the previous helm deployment did not solve the issue for me.
Same issue - and confirmed no resources left in cluster from previous install.
Having the same issue. Deleting the resources from the previous helm deployment did not solve the issue for me.
I was able to get this working eventually, it's possible I missed cleaning up an endpoint or something.
@jholm117 @davidandreoletti we can fix issue by just deleting one ep (EndPoint) with name like clustername-config, where clustername is name provided during helm installation.
I am still seeing this issue after using different release name and deleting older endpoints. It just stops suddenly after sometime. Any different solutions would be greatly appreciated. Thanks.
Same here. It is happening in the latest release 0.27.4 It resolves automatically after a few minutes
Same issue here. Happens on latest 0.27.5 as well. Would be good to see this fixed finally
Same issue here, moving the deployment to a new namespace solved it temporarilly for me
Removing endpoints from a previous helm deployment solved it for me.
@JohnTzoumas thanks a lot! I have the same issue with the latest version.
I test disaster recovery right now and killed all PVCs + PODs. The startup of the new timescale pod stops at:
timescaledb 2023-05-25 11:53:56,422 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
When I delete the 4 endpoints the recovery runs through.
I have the same issue. But in my case I have disabled the persistent storage, because in our dev environment we would like to clean the db by just restarting the container. I have also tried to set this to false: patroni.postgresql.pgbackrest.keep_data = false but no effect.