operations icon indicating copy to clipboard operation
operations copied to clipboard

Optimise Prometheus S3 backup

Open Firefishy opened this issue 7 months ago • 3 comments

Currently backup a lot more than needed. Can we optimise what we send to the S3 backup?

Firefishy avatar May 15 '25 18:05 Firefishy

To explain a bit, this is the current live listing of our prometheus storage:

BLOCK ULID                  MIN TIME                       MAX TIME                       DURATION        NUM SAMPLES   NUM CHUNKS   NUM SERIES   SIZE
01HV7PYRV06EKMG5EJGX95NKRD  2024-03-22 12:00:00 +0000 UTC  2024-04-11 18:00:00 +0000 UTC  485h59m59.972s  162441341145  1358039832   1891699      105GiB677MiB486KiB676B
01HWVVG2QQ7HBG3PQYHVPQ32HQ  2024-04-11 18:00:00 +0000 UTC  2024-05-02 00:00:00 +0000 UTC  485h59m59.972s  161366502626  1348402454   2231708      105GiB738MiB606KiB961B
01HYG021TYK0R68CF8KNZ4X3DG  2024-05-02 00:00:00 +0000 UTC  2024-05-22 06:00:00 +0000 UTC  485h59m59.972s  161891891023  1350709179   1648828      106GiB5MiB593KiB854B
01J044KGT663DCDA5JHEYXQMEE  2024-05-22 06:00:00 +0000 UTC  2024-06-11 12:00:00 +0000 UTC  485h59m59.972s  161659122034  1344446853   1682969      106GiB833MiB163KiB889B
01J1R94V4SVDHTBRTWT1N4HV92  2024-06-11 12:00:00 +0000 UTC  2024-07-01 18:00:00 +0000 UTC  485h59m59.971s  162373700750  1357260875   1776370      106GiB700MiB350KiB683B
01J3CDQ5H1CREVFNVNS9DB3YQR  2024-07-01 18:00:00 +0000 UTC  2024-07-22 00:00:00 +0000 UTC  485h59m59.971s  162456327858  1357452067   1688092      106GiB456MiB575KiB87B
01J50J7NE30QV0J5DGWNRPME9P  2024-07-22 00:00:00 +0000 UTC  2024-08-11 06:00:00 +0000 UTC  485h59m59.971s  161573958302  1349918951   1673259      105GiB945MiB224KiB602B
01J6MPS5XAFQ86GYBVAE7W0C4M  2024-08-11 06:00:00 +0000 UTC  2024-08-31 12:00:00 +0000 UTC  485h59m59.971s  161682386999  1351098202   1913633      106GiB365MiB814KiB693B
01J88VB2RHFJS74D6AGYZDKR8P  2024-08-31 12:00:00 +0000 UTC  2024-09-20 18:00:00 +0000 UTC  485h59m59.971s  162537219885  1358344461   1862225      108GiB14MiB662KiB601B
01J9WZX6H8ERFMCR5ZFE76PSF4  2024-09-20 18:00:00 +0000 UTC  2024-10-11 00:00:00 +0000 UTC  485h59m59.971s  166073207118  1387802075   1883579      109GiB411MiB336KiB821B
01JBH4ERRAZGQT99CHPQ8R55B2  2024-10-11 00:00:00 +0000 UTC  2024-10-31 06:00:00 +0000 UTC  485h59m59.971s  168475719637  1407670452   1756051      110GiB824MiB123KiB209B
01JD590FA7A5X9EPAXWWV5QFQ3  2024-10-31 06:00:00 +0000 UTC  2024-11-20 12:00:00 +0000 UTC  485h59m59.971s  167755433175  1401731472   2395345      113GiB43MiB366KiB198B
01JESDJ4675TBS6TPV4NG77R2T  2024-11-20 12:00:00 +0000 UTC  2024-12-10 18:00:00 +0000 UTC  485h59m59.971s  168309268213  1406269504   1664327      116GiB78MiB738KiB443B
01JGDJ3S9F4HWBZ9FNPFNG3K4G  2024-12-10 18:00:00 +0000 UTC  2024-12-31 00:00:00 +0000 UTC  485h59m59.878s  163832463513  1368474332   1726922      111GiB668MiB652KiB443B
01JJ1PMQZ0XX7NHCXR06X7GSFW  2024-12-31 00:00:00 +0000 UTC  2025-01-20 06:00:00 +0000 UTC  485h59m59.878s  165987486674  1387538098   1697840      110GiB928MiB720KiB412B
01JKNV6D56PX0VJNKEH458A6QK  2025-01-20 06:00:00 +0000 UTC  2025-02-09 12:00:00 +0000 UTC  485h59m59.795s  162794120196  1360614092   1618585      111GiB1002MiB487KiB674B
01JN9ZQDNY1A4T4EEBNBDGMSEH  2025-02-09 12:00:00 +0000 UTC  2025-03-01 18:00:00 +0000 UTC  485h59m59.795s  162224716013  1354949845   1609941      110GiB247MiB751KiB95B
01JPY496R0FPBQTSH0QDSZ6KQ6  2025-03-01 18:00:00 +0000 UTC  2025-03-22 00:00:00 +0000 UTC  485h59m59.795s  162031809809  1353898292   1698229      109GiB511MiB629KiB359B
01JRJ8TZHYNDH1GZDK8KYFGDKB  2025-03-22 00:00:00 +0000 UTC  2025-04-11 06:00:00 +0000 UTC  485h59m59.795s  163281129208  1364549752   1723569      108GiB1011MiB496KiB564B
01JT6DCDRTR14AKP57ZZEZ5EK9  2025-04-11 06:00:00 +0000 UTC  2025-05-01 12:00:00 +0000 UTC  485h59m59.795s  163273695534  1363235863   1654564      107GiB677MiB715KiB110B
01JTQS84X74XF99GXHB58GR365  2025-05-01 12:00:00 +0000 UTC  2025-05-08 06:00:00 +0000 UTC  161h59m59.795s  51403124251   431166410    1618222      33GiB544MiB696KiB406B
01JV95E56QP89BF9P7N7YTGWW5  2025-05-08 06:00:00 +0000 UTC  2025-05-15 00:00:00 +0000 UTC  161h59m59.795s  54280780359   454908328    1588896      35GiB69MiB702KiB1018B
01JV9STBARBXCPASFFQD3MQ1AP  2025-05-15 00:00:00 +0000 UTC  2025-05-15 06:00:00 +0000 UTC  5h59m59.795s    1987303520    16614635     1532920      1GiB382MiB747KiB639B
01JVAEEW3QDTRC5GF668Q9DGEB  2025-05-15 06:00:00 +0000 UTC  2025-05-15 12:00:00 +0000 UTC  5h59m59.795s    1997349892    16697642     1538651      1GiB432MiB870KiB118B
01JVA7G5HZ9RS354T76CXM9C6T  2025-05-15 12:00:00 +0000 UTC  2025-05-15 14:00:00 +0000 UTC  1h59m59.795s    666811052     5426419      1539125      579MiB734KiB757B
01JVAEDBQH49PR6DAGH0V52V2P  2025-05-15 14:00:00 +0000 UTC  2025-05-15 16:00:00 +0000 UTC  1h59m59.795s    667223513     5578419      1538947      563MiB497KiB959B
01JVAN7P0E80PQ3DQ96PH3HBT6  2025-05-15 16:00:00 +0000 UTC  2025-05-15 18:00:00 +0000 UTC  1h59m59.795s    668484551     5588999      1541650      568MiB301KiB486B

Each day we take a snapshot of that and sync it to S3 as backup.

The problem is that while the 486h blocks are complete the smaller blocks in the last few weeks are intermediates that get rolled up into large blocks and then removed until there is a new 486h one. This means that S3 contains duplicate data and it's also hard to work out what to restore.

So the idea is to try and split the snapshot into full blocks and non-full blocks and send them to separate buckets - the second bucket could then expire anything more than a month old as we only need the last few weeks as backup for things since the last full block.

tomhughes avatar May 15 '25 19:05 tomhughes

Here is a document which describes how to make a TSDB Snapshot for backup: https://gist.github.com/ksingh7/d5e4414d92241e0802e59fa4c585b98b

Firefishy avatar Jun 16 '25 16:06 Firefishy

What exactly do you think we're doing? Exactly that!

tomhughes avatar Jun 16 '25 16:06 tomhughes