helm-charts
helm-charts copied to clipboard
`No space left on device` - All timescale instances are replicas
After some time, timescale deployment stopped working with a No space left on device
error.
pgrest has been creating backups for about two months, then one day it apparently couldn't connect to timescale and I guess that's why pg_wal partition wasn't getting archived anymore, therefore disk is full?
I'm not entirely sure what is the cause of this.
Conclusions: 28.10.2021 was the last successful backup
Using patronictl toplogy
I can see that all postgres instaces are of role Replica:
+ Cluster: timescale (uninitialized) ------+---------+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------------------------+--------------+---------+---------+-----+-----------+
| + timescale-timescaledb-0 | 10.244.1.170 | Replica | running | 113 | 0 |
| + timescale-timescaledb-1 | 10.244.2.31 | Replica | running | 96 | 0 |
| + timescale-timescaledb-2 | 10.244.0.104 | Replica | running | 96 | 0 |
+---------------------------+--------------+---------+---------+-----+-----------+
timescale-timescaledb-0 pod has label role=promoted
, while others are role=replica
I'm unable to psql into the postgres pod:
psql: error: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
command terminated with exit code 2
Highlighted error logs
2021-12-07 18:41:03 UTC [31394]: [61afaabd.7aa2-9] @,app= [53100] FATAL: could not write to file "pg_wal/xlogtemp.31394": No space left on device
ERROR: [064]: unable to write '/var/lib/postgresql/data/pg_wal/RECOVERYHISTORY': [28] No space left on device
ERROR: [064]: unable to write '/var/lib/postgresql/data/pg_wal/RECOVERYXLOG'
2021-11-30 02:13:02,601 - WARNING - backup - WARN: unable to check pg-1: [DbConnectError] unable to connect to 'dbname='postgres' port=5432 host='/var/run/postgresql'': could not connect to server: No such file or directory
(Full logs available below)
Deployment Timescaledb-single v0.8.2
Chart values:
backup:
enabled: true
postInit:
- configMap:
name: timescale-post-init
optional: false
patroni:
bootstrap:
dcs:
postgresql:
parameters:
password_encryption: scram-sha-256
ssl: 'off' # disable all ssl connections to not require Patroni certificate
postgresql:
pg_hba:
- local all postgres peer # allow local connections
- local all all scram-sha-256
- hostnossl all,replication all all scram-sha-256 # allow TCP/IP connections (no ssl)
kubectl get all,secret,configmap,endpoints,pvc -L role -l release=$RELEASE
NAME READY STATUS RESTARTS AGE ROLE
pod/timescale-timescaledb-0 1/2 Running 0 64d promoted
pod/timescale-timescaledb-1 2/2 Running 0 64d replica
pod/timescale-timescaledb-2 2/2 Running 0 64d replica
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ROLE
service/timescale LoadBalancer 10.245.213.171 144.126.246.94 5432:31638/TCP 64d master
service/timescale-config ClusterIP None <none> 8008/TCP 64d
service/timescale-replica ClusterIP 10.245.113.208 <none> 5432/TCP 64d replica
service/timescale-timescaledb-backup ClusterIP 10.245.46.103 <none> 8081/TCP 64d
NAME READY AGE ROLE
statefulset.apps/timescale-timescaledb 2/3 64d
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE ROLE
cronjob.batch/timescale-timescaledb-full-weekly 12 02 * * 0 False 0 2d16h 64d
cronjob.batch/timescale-timescaledb-incremental-daily 12 02 * * 1-6 False 0 16h 64d
NAME DATA AGE ROLE
configmap/timescale-timescaledb-patroni 1 64d
configmap/timescale-timescaledb-pgbackrest 1 64d
configmap/timescale-timescaledb-scripts 8 64d
NAME ENDPOINTS AGE ROLE
endpoints/timescale 10.244.1.170:5432 64d
endpoints/timescale-config 10.244.0.104:8008,10.244.2.31:8008 64d
endpoints/timescale-replica 10.244.0.104:5432,10.244.2.31:5432 64d replica
endpoints/timescale-timescaledb-backup <none> 64d
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ROLE
persistentvolumeclaim/storage-volume-timescale-timescaledb-0 Bound pvc-d1dc9e4b-bbd1-4622-8707-43e147476fc2 2Gi RWO do-block-storage 64d
persistentvolumeclaim/storage-volume-timescale-timescaledb-1 Bound pvc-099d2b96-0df8-4842-b0a0-61b6b110601f 2Gi RWO do-block-storage 64d
persistentvolumeclaim/storage-volume-timescale-timescaledb-2 Bound pvc-16184868-34a6-415c-a5cb-06478056d378 2Gi RWO do-block-storage 64d
persistentvolumeclaim/wal-volume-timescale-timescaledb-0 Bound pvc-c60b1bea-76a6-4520-8db0-c46ccc7d724e 1Gi RWO do-block-storage 64d
persistentvolumeclaim/wal-volume-timescale-timescaledb-1 Bound pvc-f8c85906-cc4a-4d41-b739-4dbf073446d9 1Gi RWO do-block-storage 64d
persistentvolumeclaim/wal-volume-timescale-timescaledb-2 Bound pvc-567cb8a0-18be-48f8-83bb-ad947df0e87b 1Gi RWO do-block-storage 64d
Pgrest logs
2021-11-30 02:12:35,920 - ERROR - backup - Backup 20211130021235 failed with returncode 56
2021-11-30 02:12:35,920 - INFO - history - Refreshing backup history using pgbackrest
2021-11-30 02:12:35,935 - INFO - http - 10.244.1.218 - - "POST /backups/ HTTP/1.1" 500 -
2021-11-30 02:12:36,924 - DEBUG - backup - Waiting until backup triggered
2021-11-30 02:13:02,437 - INFO - backup - Starting backup
2021-11-30 02:13:02,601 - WARNING - backup - WARN: unable to check pg-1: [DbConnectError] unable to connect to 'dbname='postgres' port=5432 host='/var/run/postgresql'': could not connect to server: No such file or directory
2021-11-30 02:13:02,601 - INFO - backup - Is the server running locally and accepting
2021-11-30 02:13:02,602 - INFO - backup - connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
2021-11-30 02:13:02,602 - ERROR - backup - ERROR: [056]: unable to find primary cluster - cannot proceed
2021-11-30 02:13:02,608 - DEBUG - backup - Backup details
{
"age": 0.0,
"duration": 0.0,
"finished": "2021-11-30T02:13:02+00:00",
"label": "20211130021302",
"pgbackrest": {},
"pid": 1564,
"returncode": 56,
"started": "2021-11-30T02:13:02+00:00",
"status": "RUNNING"
}
2021-11-30 02:13:02,612 - ERROR - backup - Backup 20211130021302 failed with returncode 56
2021-11-30 02:13:02,612 - INFO - history - Refreshing backup history using pgbackrest
2021-11-30 02:13:02,645 - INFO - http - 10.244.1.218 - - "POST /backups/ HTTP/1.1" 500 -
2021-11-30 02:13:03,616 - DEBUG - backup - Waiting until backup triggered
2021-11-30 03:13:03,831 - INFO - history - Refreshing backup history using pgbackrest
2021-11-30 04:13:05,068 - INFO - history - Refreshing backup history using pgbackrest
2021-11-30 05:13:06,228 - INFO - history - Refreshing backup history using pgbackrest
2021-11-30 06:13:07,456 - INFO - history - Refreshing backup history using pgbackrest
2021-11-30 07:13:08,644 - INFO - history - Refreshing backup history using pgbackrest
2021-11-30 08:13:11,710 - INFO - history - Refreshing backup history using pgbackrest
2021-11-30 09:13:12,932 - INFO - history - Refreshing backup history using pgbackrest
2021-11-30 10:13:14,096 - INFO - history - Refreshing backup history using pgbackrest
2021-11-30 11:13:15,240 - INFO - history - Refreshing backup history using pgbackrest
Timescaledb logs
2021-12-07 18:41:24 UTC [31449]: [61afaad4.7ad9-1] @,app= [00000] LOG: starting PostgreSQL 12.6 (Debian 12.6-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-12-07 18:41:24 UTC [31449]: [61afaad4.7ad9-2] @,app= [00000] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-12-07 18:41:24 UTC [31449]: [61afaad4.7ad9-3] @,app= [00000] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-12-07 18:41:24 UTC [31452]: [61afaad4.7adc-1] [unknown]@[unknown],app=[unknown] [00000] LOG: connection received: host=[local]
2021-12-07 18:41:24 UTC [31451]: [61afaad4.7adb-1] @,app= [00000] LOG: database system was interrupted while in recovery at log time 2021-10-28 11:05:35 UTC
2021-12-07 18:41:24 UTC [31451]: [61afaad4.7adb-2] @,app= [00000] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2021-12-07 18:41:24 UTC [31452]: [61afaad4.7adc-2] postgres@postgres,app=[unknown] [57P03] FATAL: the database system is starting up
/var/run/postgresql:5432 - rejecting connections
2021-12-07 18:41:24 UTC [31454]: [61afaad4.7ade-1] [unknown]@[unknown],app=[unknown] [00000] LOG: connection received: host=[local]
2021-12-07 18:41:24 UTC [31454]: [61afaad4.7ade-2] postgres@postgres,app=[unknown] [57P03] FATAL: the database system is starting up
/var/run/postgresql:5432 - rejecting connections
2021-12-07 18:41:25 UTC [31451]: [61afaad4.7adb-3] @,app= [00000] LOG: entering standby mode
ERROR: [064]: unable to write '/var/lib/postgresql/data/pg_wal/RECOVERYHISTORY': [28] No space left on device
ERROR: [064]: unable to write '/var/lib/postgresql/data/pg_wal/RECOVERYXLOG'
2021-12-07 18:41:25 UTC [31451]: [61afaad4.7adb-4] @,app= [00000] LOG: redo starts at 2/2C01FF00
2021-12-07 18:41:25 UTC [31451]: [61afaad4.7adb-5] @,app= [00000] LOG: consistent recovery state reached at 2/2D000000
2021-12-07 18:41:25 UTC [31449]: [61afaad4.7ad9-4] @,app= [00000] LOG: database system is ready to accept read only connections
2021-12-07 18:41:25 UTC [31473]: [61afaad5.7af1-1] [unknown]@[unknown],app=[unknown] [00000] LOG: connection received: host=[local]
2021-12-07 18:41:25 UTC [31473]: [61afaad5.7af1-2] postgres@postgres,app=[unknown] [00000] LOG: connection authorized: user=postgres database=postgres application_name=pg_isready
/var/run/postgresql:5432 - accepting connections
2021-12-07 18:41:25 UTC [31473]: [61afaad5.7af1-3] postgres@postgres,app=pg_isready [00000] LOG: disconnection: session time: 0:00:00.028 user=postgres database=postgres host=[local]
2021-12-07 18:41:25 UTC [31475]: [61afaad5.7af3-1] [unknown]@[unknown],app=[unknown] [00000] LOG: connection received: host=[local]
2021-12-07 18:41:25 UTC [31475]: [61afaad5.7af3-2] postgres@postgres,app=[unknown] [00000] LOG: connection authorized: user=postgres database=postgres application_name=Patroni
2021-12-07 18:41:25 UTC [31477]: [61afaad5.7af5-1] [unknown]@[unknown],app=[unknown] [00000] LOG: connection received: host=127.0.0.1 port=52030
2021-12-07 18:41:25 UTC [31477]: [61afaad5.7af5-2] standby@[unknown],app=[unknown] [00000] LOG: replication connection authorized: user=standby
2021-12-07 18:41:25 UTC [31477]: [61afaad5.7af5-3] standby@[unknown],app=[unknown] [00000] LOG: disconnection: session time: 0:00:00.014 user=standby database= host=127.0.0.1 port=52030
server promoting
2021-12-07 18:41:25 UTC [31451]: [61afaad4.7adb-6] @,app= [00000] LOG: received promote request
2021-12-07 18:41:25 UTC [31451]: [61afaad4.7adb-7] @,app= [00000] LOG: redo done at 2/2C01FFE8
ERROR: [064]: unable to write '/var/lib/postgresql/data/pg_wal/RECOVERYXLOG'
2021-12-07 18:41:26 UTC [31451]: [61afaad4.7adb-8] @,app= [00000] LOG: selected new timeline ID: 114
2021-12-07 18:41:26 UTC [31451]: [61afaad4.7adb-9] @,app= [53100] FATAL: could not write to file "pg_wal/xlogtemp.31451": No space left on device
2021-12-07 18:41:26 UTC [31449]: [61afaad4.7ad9-5] @,app= [00000] LOG: startup process (PID 31451) exited with exit code 1
2021-12-07 18:41:26 UTC [31449]: [61afaad4.7ad9-6] @,app= [00000] LOG: terminating any other active server processes
2021-12-07 18:41:26 UTC [31475]: [61afaad5.7af3-3] postgres@postgres,app=Patroni [57P02] WARNING: terminating connection because of crash of another server process
2021-12-07 18:41:26 UTC [31475]: [61afaad5.7af3-4] postgres@postgres,app=Patroni [57P02] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-12-07 18:41:26 UTC [31475]: [61afaad5.7af3-5] postgres@postgres,app=Patroni [57P02] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2021-12-07 18:41:26 UTC [31449]: [61afaad4.7ad9-7] @,app= [00000] LOG: database system is shut down
Master timescaledb instance logs
/var/lib/postgresql/wal/pg_wal$ ls -al
total 927948
drwx------ 3 postgres postgres 12288 Dec 8 11:43 .
drwxrwsr-x 4 root postgres 4096 Oct 4 16:10 ..
-rw------- 1 postgres postgres 373 Oct 7 02:12 000000010000000000000068.00000028.backup
-rw-r----- 1 postgres postgres 42 Oct 8 18:34 00000002.history
-rw------- 1 postgres postgres 372 Oct 8 02:12 000000020000000000000099.00000028.backup
-rw-r----- 1 postgres postgres 85 Oct 8 22:18 00000003.history
-rw-r----- 1 postgres postgres 128 Oct 9 02:26 00000004.history
-rw------- 1 postgres postgres 373 Oct 9 02:12 0000000400000000000000C9.00000028.backup
-rw-r----- 1 postgres postgres 171 Oct 9 06:31 00000005.history
-rw-r----- 1 postgres postgres 214 Oct 9 14:55 00000006.history
-rw-r----- 1 postgres postgres 257 Oct 9 18:43 00000007.history
-rw-r----- 1 postgres postgres 300 Oct 9 22:44 00000008.history
-rw-r----- 1 postgres postgres 343 Oct 10 02:39 00000009.history
-rw------- 1 postgres postgres 373 Oct 10 02:12 0000000900000000000000FA.00000028.backup
-rw-r----- 1 postgres postgres 386 Oct 10 06:50 0000000A.history
-rw-r----- 1 postgres postgres 429 Oct 10 10:51 0000000B.history
-rw-r----- 1 postgres postgres 472 Oct 10 14:42 0000000C.history
-rw-r----- 1 postgres postgres 516 Oct 10 18:42 0000000D.history
-rw-r----- 1 postgres postgres 560 Oct 10 22:47 0000000E.history
-rw-r----- 1 postgres postgres 604 Oct 11 02:37 0000000F.history
-rw------- 1 postgres postgres 375 Oct 11 02:12 0000000F0000000100000028.00000028.backup
-rw-r----- 1 postgres postgres 648 Oct 11 06:52 00000010.history
-rw-r----- 1 postgres postgres 692 Oct 11 10:49 00000011.history
-rw-r----- 1 postgres postgres 736 Oct 11 15:01 00000012.history
-rw-r----- 1 postgres postgres 780 Oct 11 19:21 00000013.history
-rw-r----- 1 postgres postgres 824 Oct 11 23:01 00000014.history
-rw-r----- 1 postgres postgres 868 Oct 12 02:58 00000015.history
-rw------- 1 postgres postgres 375 Oct 12 02:12 000000150000000100000056.00000028.backup
-rw-r----- 1 postgres postgres 912 Oct 12 07:03 00000016.history
-rw-r----- 1 postgres postgres 956 Oct 12 11:07 00000017.history
-rw-r----- 1 postgres postgres 1000 Oct 12 15:02 00000018.history
-rw-r----- 1 postgres postgres 1044 Oct 12 18:58 00000019.history
-rw-r----- 1 postgres postgres 1088 Oct 12 22:53 0000001A.history
-rw-r----- 1 postgres postgres 1132 Oct 13 02:50 0000001B.history
-rw------- 1 postgres postgres 375 Oct 13 02:12 0000001B0000000100000067.00000060.backup
-rw-r----- 1 postgres postgres 1176 Oct 13 06:54 0000001C.history
-rw-r----- 1 postgres postgres 1220 Oct 13 10:56 0000001D.history
-rw-r----- 1 postgres postgres 1264 Oct 13 14:48 0000001E.history
-rw-r----- 1 postgres postgres 1308 Oct 13 18:45 0000001F.history
-rw-r----- 1 postgres postgres 1352 Oct 13 22:57 00000020.history
-rw-r----- 1 postgres postgres 1396 Oct 14 02:46 00000021.history
-rw------- 1 postgres postgres 375 Oct 14 02:12 000000210000000100000074.00000060.backup
-rw-r----- 1 postgres postgres 1440 Oct 14 10:49 00000022.history
-rw-r----- 1 postgres postgres 1484 Oct 14 14:45 00000023.history
-rw-r----- 1 postgres postgres 1528 Oct 14 18:48 00000024.history
-rw-r----- 1 postgres postgres 1572 Oct 14 22:50 00000025.history
-rw-r----- 1 postgres postgres 1616 Oct 15 02:44 00000026.history
-rw------- 1 postgres postgres 375 Oct 15 02:12 00000026000000010000007F.00000060.backup
-rw-r----- 1 postgres postgres 1660 Oct 15 06:39 00000027.history
-rw-r----- 1 postgres postgres 1704 Oct 15 10:39 00000028.history
-rw-r----- 1 postgres postgres 1748 Oct 15 14:27 00000029.history
-rw-r----- 1 postgres postgres 1792 Oct 15 18:44 0000002A.history
-rw-r----- 1 postgres postgres 1836 Oct 15 22:31 0000002B.history
-rw-r----- 1 postgres postgres 1880 Oct 16 02:39 0000002C.history
-rw------- 1 postgres postgres 375 Oct 16 02:12 0000002C0000000100000090.00000028.backup
-rw-r----- 1 postgres postgres 1924 Oct 16 06:27 0000002D.history
-rw-r----- 1 postgres postgres 1968 Oct 16 10:37 0000002E.history
-rw-r----- 1 postgres postgres 2012 Oct 16 14:44 0000002F.history
-rw-r----- 1 postgres postgres 2056 Oct 16 18:41 00000030.history
-rw-r----- 1 postgres postgres 2100 Oct 16 22:34 00000031.history
-rw-r----- 1 postgres postgres 2144 Oct 17 02:09 00000032.history
-rw-r----- 1 postgres postgres 2188 Oct 17 06:30 00000033.history
-rw------- 1 postgres postgres 375 Oct 17 02:12 00000033000000010000009C.00000060.backup
-rw-r----- 1 postgres postgres 2232 Oct 17 10:28 00000034.history
-rw-r----- 1 postgres postgres 2276 Oct 17 14:06 00000035.history
-rw-r----- 1 postgres postgres 2320 Oct 17 22:26 00000036.history
-rw-r----- 1 postgres postgres 2364 Oct 18 02:08 00000037.history
-rw-r----- 1 postgres postgres 2408 Oct 18 06:28 00000038.history
-rw------- 1 postgres postgres 375 Oct 18 02:12 0000003800000001000000A6.00000060.backup
-rw-r----- 1 postgres postgres 2452 Oct 18 09:58 00000039.history
-rw-r----- 1 postgres postgres 2496 Oct 18 13:24 0000003A.history
-rw-r----- 1 postgres postgres 2540 Oct 18 21:29 0000003B.history
-rw-r----- 1 postgres postgres 2584 Oct 19 01:52 0000003C.history
-rw-r----- 1 postgres postgres 2628 Oct 19 06:23 0000003D.history
-rw------- 1 postgres postgres 375 Oct 19 02:12 0000003D00000001000000B3.00000028.backup
-rw-r----- 1 postgres postgres 2672 Oct 19 09:44 0000003E.history
-rw-r----- 1 postgres postgres 2716 Oct 19 13:15 0000003F.history
-rw-r----- 1 postgres postgres 2760 Oct 19 14:12 00000040.history
-rw-r----- 1 postgres postgres 2804 Oct 19 18:14 00000041.history
-rw-r----- 1 postgres postgres 2848 Oct 20 02:07 00000042.history
-rw-r----- 1 postgres postgres 2892 Oct 20 06:09 00000043.history
-rw------- 1 postgres postgres 374 Oct 20 02:12 0000004300000001000000BF.00000028.backup
-rw-r----- 1 postgres postgres 2936 Oct 20 10:33 00000044.history
-rw-r----- 1 postgres postgres 2980 Oct 20 14:18 00000045.history
-rw-r----- 1 postgres postgres 3024 Oct 20 17:38 00000046.history
-rw-r----- 1 postgres postgres 3068 Oct 20 21:24 00000047.history
-rw-r----- 1 postgres postgres 3112 Oct 21 01:20 00000048.history
-rw-r----- 1 postgres postgres 3156 Oct 21 06:03 00000049.history
-rw------- 1 postgres postgres 375 Oct 21 02:12 0000004900000001000000CD.00000060.backup
-rw-r----- 1 postgres postgres 3200 Oct 21 09:52 0000004A.history
-rw-r----- 1 postgres postgres 3244 Oct 21 13:39 0000004B.history
-rw-r----- 1 postgres postgres 3288 Oct 21 16:40 0000004C.history
-rw-r----- 1 postgres postgres 3332 Oct 21 17:40 0000004D.history
-rw-r----- 1 postgres postgres 3376 Oct 21 21:33 0000004E.history
-rw-r----- 1 postgres postgres 3420 Oct 22 01:18 0000004F.history
-rw-r----- 1 postgres postgres 3464 Oct 22 08:13 00000050.history
-rw------- 1 postgres postgres 375 Oct 22 02:12 0000005000000001000000DA.00000060.backup
-rw-r----- 1 postgres postgres 3508 Oct 23 00:18 00000051.history
-rw-r----- 1 postgres postgres 3552 Oct 23 04:48 00000052.history
-rw------- 1 postgres postgres 375 Oct 23 02:12 0000005200000001000000E5.00000060.backup
-rw-r----- 1 postgres postgres 3596 Oct 23 09:03 00000053.history
-rw-r----- 1 postgres postgres 3640 Oct 23 13:35 00000054.history
-rw-r----- 1 postgres postgres 3684 Oct 23 17:18 00000055.history
-rw-r----- 1 postgres postgres 3728 Oct 23 17:19 00000056.history
-rw-r----- 1 postgres postgres 3772 Oct 23 21:28 00000057.history
-rw-r----- 1 postgres postgres 3816 Oct 24 01:21 00000058.history
-rw-r----- 1 postgres postgres 3860 Oct 24 04:55 00000059.history
-rw------- 1 postgres postgres 375 Oct 24 02:12 0000005900000001000000F2.00000028.backup
-rw-r----- 1 postgres postgres 3904 Oct 24 09:23 0000005A.history
-rw-r----- 1 postgres postgres 3948 Oct 24 16:53 0000005B.history
-rw-r----- 1 postgres postgres 3992 Oct 24 21:04 0000005C.history
-rw-r----- 1 postgres postgres 4036 Oct 25 01:02 0000005D.history
-rw-r----- 1 postgres postgres 4080 Oct 25 04:47 0000005E.history
-rw------- 1 postgres postgres 375 Oct 25 02:12 0000005E00000001000000FF.00000060.backup
-rw-r----- 1 postgres postgres 4123 Oct 25 09:01 0000005F.history
-rw------- 1 postgres postgres 16777216 Oct 25 09:01 0000005F0000000200000004.partial
-rw-r----- 1 postgres postgres 4166 Oct 25 12:51 00000060.history
-rw------- 1 postgres postgres 16777216 Oct 25 09:31 000000600000000200000004
-rw------- 1 postgres postgres 16777216 Oct 25 12:51 000000600000000200000005
-rw-r----- 1 postgres postgres 16777216 Oct 25 12:51 000000600000000200000006.partial
-rw-r----- 1 postgres postgres 4209 Oct 25 16:52 00000061.history
-rw-r----- 1 postgres postgres 16777216 Oct 25 16:52 000000610000000200000006
-rw-r----- 1 postgres postgres 4252 Oct 25 20:52 00000062.history
-rw------- 1 postgres postgres 16777216 Oct 25 17:22 000000620000000200000007
-rw------- 1 postgres postgres 16777216 Oct 25 20:52 000000620000000200000008.partial
-rw-r----- 1 postgres postgres 4295 Oct 26 02:52 00000063.history
-rw------- 1 postgres postgres 16777216 Oct 25 21:22 000000630000000200000008
-rw------- 1 postgres postgres 16777216 Oct 25 23:22 000000630000000200000009
-rw------- 1 postgres postgres 16777216 Oct 26 00:22 00000063000000020000000A
-rw------- 1 postgres postgres 16777216 Oct 26 02:12 00000063000000020000000B
-rw-r----- 1 postgres postgres 16777216 Oct 26 02:12 00000063000000020000000C
-rw------- 1 postgres postgres 372 Oct 26 02:12 00000063000000020000000C.00000060.backup
-rw------- 1 postgres postgres 16777216 Oct 26 02:42 00000063000000020000000D
-rw-r----- 1 postgres postgres 16777216 Oct 26 02:52 00000063000000020000000E.partial
-rw-r----- 1 postgres postgres 4338 Oct 26 06:40 00000064.history
-rw------- 1 postgres postgres 16777216 Oct 26 03:22 00000064000000020000000E
-rw------- 1 postgres postgres 16777216 Oct 26 04:22 00000064000000020000000F
-rw------- 1 postgres postgres 16777216 Oct 26 06:22 000000640000000200000010
-rw-r----- 1 postgres postgres 16777216 Oct 26 06:39 000000640000000200000011.partial
-rw-r----- 1 postgres postgres 4383 Oct 26 10:55 00000065.history
-rw------- 1 postgres postgres 16777216 Oct 26 07:10 000000650000000200000011
-rw------- 1 postgres postgres 16777216 Oct 26 10:55 000000650000000200000012.partial
-rw-r----- 1 postgres postgres 4428 Oct 26 14:50 00000066.history
-rw-r----- 1 postgres postgres 16777216 Oct 26 14:50 000000660000000200000012
-rw-r----- 1 postgres postgres 4473 Oct 26 22:59 00000067.history
-rw------- 1 postgres postgres 16777216 Oct 26 15:20 000000670000000200000013
-rw------- 1 postgres postgres 16777216 Oct 26 22:59 000000670000000200000014
-rw------- 1 postgres postgres 16777216 Oct 26 22:59 000000670000000200000015.partial
-rw-r----- 1 postgres postgres 4518 Oct 27 02:51 00000068.history
-rw------- 1 postgres postgres 16777216 Oct 26 23:29 000000680000000200000015
-rw------- 1 postgres postgres 16777216 Oct 27 00:29 000000680000000200000016
-rw------- 1 postgres postgres 16777216 Oct 27 02:12 000000680000000200000017
-rw------- 1 postgres postgres 16777216 Oct 27 02:12 000000680000000200000018
-rw------- 1 postgres postgres 377 Oct 27 02:12 000000680000000200000018.00000060.backup
-rw-r----- 1 postgres postgres 16777216 Oct 27 02:42 000000680000000200000019
-rw------- 1 postgres postgres 16777216 Oct 27 02:50 00000068000000020000001A.partial
-rw-r----- 1 postgres postgres 4563 Oct 27 06:54 00000069.history
-rw------- 1 postgres postgres 16777216 Oct 27 03:21 00000069000000020000001A
-rw------- 1 postgres postgres 16777216 Oct 27 04:21 00000069000000020000001B
-rw------- 1 postgres postgres 16777216 Oct 27 06:21 00000069000000020000001C
-rw------- 1 postgres postgres 16777216 Oct 27 06:54 00000069000000020000001D.partial
-rw-r----- 1 postgres postgres 4608 Oct 27 10:41 0000006A.history
-rw------- 1 postgres postgres 16777216 Oct 27 07:24 0000006A000000020000001D
-rw------- 1 postgres postgres 16777216 Oct 27 10:41 0000006A000000020000001E.partial
-rw-r----- 1 postgres postgres 4653 Oct 27 14:45 0000006B.history
-rw------- 1 postgres postgres 16777216 Oct 27 11:11 0000006B000000020000001E
-rw------- 1 postgres postgres 16777216 Oct 27 14:45 0000006B000000020000001F.partial
-rw-r----- 1 postgres postgres 4698 Oct 27 18:46 0000006C.history
-rw------- 1 postgres postgres 16777216 Oct 27 15:16 0000006C000000020000001F
-rw------- 1 postgres postgres 16777216 Oct 27 18:46 0000006C0000000200000020.partial
-rw-r----- 1 postgres postgres 4743 Oct 27 22:43 0000006D.history
-rw------- 1 postgres postgres 16777216 Oct 27 19:16 0000006D0000000200000020
-rw------- 1 postgres postgres 16777216 Oct 27 22:43 0000006D0000000200000021.partial
-rw-r----- 1 postgres postgres 4788 Oct 28 02:46 0000006E.history
-rw------- 1 postgres postgres 16777216 Oct 27 23:14 0000006E0000000200000021
-rw------- 1 postgres postgres 16777216 Oct 27 23:44 0000006E0000000200000022
-rw------- 1 postgres postgres 16777216 Oct 28 00:14 0000006E0000000200000023
-rw------- 1 postgres postgres 16777216 Oct 28 00:44 0000006E0000000200000024
-rw------- 1 postgres postgres 16777216 Oct 28 02:12 0000006E0000000200000025
-rw------- 1 postgres postgres 16777216 Oct 28 02:12 0000006E0000000200000026
-rw------- 1 postgres postgres 377 Oct 28 02:12 0000006E0000000200000026.00000028.backup
-rw-r----- 1 postgres postgres 16777216 Oct 28 02:46 0000006E0000000200000027
-rw-r----- 1 postgres postgres 4833 Oct 28 06:44 0000006F.history
-rw------- 1 postgres postgres 16777216 Oct 28 03:16 0000006F0000000200000028
-rw------- 1 postgres postgres 16777216 Oct 28 04:16 0000006F0000000200000029
-rw------- 1 postgres postgres 16777216 Oct 28 06:16 0000006F000000020000002A
-rw------- 1 postgres postgres 16777216 Oct 28 06:44 0000006F000000020000002B.partial
-rw-r----- 1 postgres postgres 4878 Oct 28 11:00 00000070.history
-rw------- 1 postgres postgres 16777216 Oct 28 07:14 00000070000000020000002B
-rw------- 1 postgres postgres 16777216 Oct 28 11:00 00000070000000020000002C.partial
-rw-r----- 1 postgres postgres 4923 Oct 28 14:44 00000071.history
-rw------- 1 postgres postgres 16777216 Oct 28 11:30 00000071000000020000002C
-rw-r----- 1 postgres postgres 10047488 Dec 8 11:43 RECOVERYXLOG
drwx------ 2 postgres postgres 4096 Oct 28 14:44 archive_status
pgbackrest logs
021-10-28 02:12:08.207 P01 INFO: backup file /var/lib/postgresql/data/base/13408/688131 (0B, 100%)
2021-10-28 02:12:08.207 P04 INFO: backup file /var/lib/postgresql/data/base/13408/679943 (0B, 100%)
2021-10-28 02:12:08.216 P03 INFO: backup file /var/lib/postgresql/data/base/13408/679945 (0B, 100%)
2021-10-28 02:12:08.221 P01 INFO: backup file /var/lib/postgresql/data/base/13408/679938 (0B, 100%)
2021-10-28 02:12:08.243 P00 INFO: incr backup size = 62.5MB
2021-10-28 02:12:08.243 P00 INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive
2021-10-28 02:12:08.552 P00 INFO: backup stop archive = 0000006E0000000200000026, lsn = 2/26000138
2021-10-28 02:12:08.560 P00 INFO: check archive for segment(s) 0000006E0000000200000026:0000006E0000000200000026
2021-10-28 02:12:08.836 P00 INFO: new backup label = 20211024-021204F_20211028-021203I
2021-10-28 02:12:09.228 P00 INFO: backup command end: completed successfully (6374ms)
-------------------PROCESS START-------------------
2021-10-29 02:12:11.949 P00 INFO: backup command begin 2.32: --compress-level=3 --compress-type=lz4 --config=/etc/pgbackrest/pgbackrest.conf --exec-id=735-c4f0716d --log-level-console=off --log-level-stderr=warn --pg1-path=/var/lib/postgresql/data --pg1-port=5432 --pg1-socket-path=/var/run/postgresql --process-max=4 --repo1-cipher-type=none --repo1-path=/staging/timescale --repo1-retention-diff=2 --repo1-retention-full=2 --repo1-s3-bucket=<redacted>--repo1-s3-endpoint=fra1.digitaloceanspaces.com --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=fra1 --repo1-type=s3 --stanza=poddb --start-fast --type=incr
2021-10-29 02:12:12.112 P00 WARN: unable to check pg-1: [DbConnectError] unable to connect to 'dbname='postgres' port=5432 host='/var/run/postgresql'': could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
2021-10-29 02:12:12.114 P00 ERROR: [056]: unable to find primary cluster - cannot proceed
2021-10-29 02:12:12.114 P00 INFO: backup command end: aborted with exception [056]
-------------------PROCESS START-------------------
2021-10-29 02:12:36.946 P00 INFO: backup command begin 2.32: --compress-level=3 --compress-type=lz4 --config=/etc/pgbackrest/pgbackrest.conf --exec-id=737-f99b8091 --log-level-console=off --log-level-stderr=warn --pg1-path=/var/lib/postgresql/data --pg1-port=5432 --pg1-socket-path=/var/run/postgresql --process-max=4 --repo1-cipher-type=none --repo1-path=/staging/timescale --repo1-retention-diff=2 --repo1-retention-full=2 --repo1-s3-bucket=<redacted> --repo1-s3-endpoint=fra1.digitaloceanspaces.com --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=fra1 --repo1-type=s3 --stanza=poddb --start-fast --type=incr
2021-10-29 02:12:37.080 P00 WARN: unable to check pg-1: [DbConnectError] unable to connect to 'dbname='postgres' port=5432 host='/var/run/postgresql'': could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
2021-10-29 02:12:37.082 P00 ERROR: [056]: unable to find primary cluster - cannot proceed
2021-10-29 02:12:37.082 P00 INFO: backup command end: aborted with exception [056]
-------------------PROCESS START-------------------
2021-10-29 02:13:02.929 P00 INFO: backup command begin 2.32: --compress-level=3 --compress-type=lz4 --config=/etc/pgbackrest/pgbackrest.conf --exec-id=739-63c091be --log-level-console=off --log-level-stderr=warn --pg1-path=/var/lib/postgresql/data --pg1-port=5432 --pg1-socket-path=/var/run/postgresql --process-max=4 --repo1-cipher-type=none --repo1-path=/staging/timescale --repo1-retention-diff=2 --repo1-retention-full=2 --repo1-s3-bucket=<redacted>--repo1-s3-endpoint=fra1.digitaloceanspaces.com --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=fra1 --repo1-type=s3 --stanza=poddb --start-fast --type=incr
2021-10-29 02:13:03.060 P00 WARN: unable to check pg-1: [DbConnectError] unable to connect to 'dbname='postgres' port=5432 host='/var/run/postgresql'': could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
2021-10-29 02:13:03.062 P00 ERROR: [056]: unable to find primary cluster - cannot proceed
2021-10-29 02:13:03.064 P00 INFO: backup command end: aborted with exception [056]
Additional context Kubernetes v1.20.2
Ran into the same issue right now. Had clean reinstalled Timescale using helm a day back. Earlier this issue didn't arise when I had it running for almost a month. Pretty curious what could have caused this.
Ran into the same issue right now. Had clean reinstalled Timescale using helm a day back. Earlier this issue didn't arise when I had it running for almost a month. Pretty curious what could have caused this.
Can you share logs, if you still have them, so we get more info on this?
Looking at the pgbackrest logs of the master instance, here is the first failed backup, after which all other backups failed too.
-------------------PROCESS START-------------------
2021-10-29 02:12:11.949 P00 INFO: backup command begin 2.32: --compress-level=3 --compress-type=lz4 --config=/etc/pgbackrest/pgbackrest.conf --exec-id=735-c4f0716d --log-level-console=off --log-level-stderr=warn --pg1-path=/var/lib/postgresql/data --pg1-port=5432 --pg1-socket-path=/var/run/postgresql --process-max=4 --repo1-cipher-type=none --repo1-path=/staging/timescale --repo1-retention-diff=2 --repo1-retention-full=2 --repo1-s3-bucket=<redacted> --repo1-s3-endpoint=fra1.digitaloceanspaces.com --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=fra1 --repo1-type=s3 --stanza=poddb --start-fast --type=incr
2021-10-29 02:12:12.112 P00 WARN: unable to check pg-1: [DbConnectError] unable to connect to 'dbname='postgres' port=5432 host='/var/run/postgresql'': could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
2021-10-29 02:12:12.114 P00 ERROR: [056]: unable to find primary cluster - cannot proceed
2021-10-29 02:12:12.114 P00 INFO: backup command end: aborted with exception [056]
Can you share logs, if you still have them, so we get more info on this?
Thanks for your quick reply, I am suspecting in my case, it was something to do with my k8s setup because I have backup disabled but I have 2 Prometheus instances feeding massive amounts of data into Timescale.
For now, I just clean reinstalled and increased sizes for the data (20GB) and wal (10GB) persistent volumes.
If I run into it again I will update it here asap.
@4nte you mentioned that one of the replicas has been promoted. That means the earlier master is no more accessible and maybe the backup was trying to connect to the old master and hence is erroring out? Shouldn't the promoted replica be connected to for the backup since it should be the new master?
@4nte you mentioned that one of the replicas has been promoted. That means the earlier master is no more accessible and maybe the backup was trying to connect to the old master and hence is erroring out? Shouldn't the promoted replica be connected to for the backup since it should be the new master?
timescale-timescaledb-0
is labeled with role=promoted
, but running patronictrl topology gives:
+ Cluster: timescale (uninitialized) ------+---------+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------------------------+--------------+---------+---------+-----+-----------+
| + timescale-timescaledb-0 | 10.244.1.170 | Replica | running | 113 | 0 |
| + timescale-timescaledb-1 | 10.244.2.31 | Replica | running | 96 | 0 |
| + timescale-timescaledb-2 | 10.244.0.104 | Replica | running | 96 | 0 |
+---------------------------+--------------+---------+---------+-----+-----------+
So It's fair to suspect that a promotion has failed mid-process, I don't know why it happened though.
@4nte try this postgresql.fastware.com