thanos icon indicating copy to clipboard operation
thanos copied to clipboard

compactor: does not compact 4 consecutive 2-hour blocks

Open vincent-olivert-riera opened this issue 10 months ago • 6 comments

Thanos, Prometheus and Golang version used:

Thanos: 0.32.4 Golang: go1.20.8

Prometheus: 2.45.0 goVersion: go1.20.5

Object Storage Provider:

Openstack S3 compatible

What happened:

I have a Thanos compactor with the following metrics:

thanos_compact_halted 0
thanos_compact_todo_compactions 0

It is tracking a bucket where almost all blocks have been compacted up to level-4. However, there are some level-1 blocks that are not compacted, and I was expecting them to be compacted into a level-2 block. I have made this animated gif to show it more clearly:

compactor

None of those blocks has been marked as no-compaction, so they should be compacted.

These are the meta.json for each one of them:

01HT1G02DF2W21A1KTHDVPX0BR
{
  "ulid": "01HT1G02DF2W21A1KTHDVPX0BR",
  "minTime": 1711584000246,
  "maxTime": 1711591200000,
  "stats": {
    "numSamples": 2492646,
    "numSeries": 5196,
    "numChunks": 20775
  },
  "compaction": {
    "level": 1,
    "sources": [
      "01HT1G02DF2W21A1KTHDVPX0BR"
    ]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "cluster_name": "alpha",
      "cluster_node": "prometheus004",
      "datasource": "alpha-002"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "sidecar",
    "segment_files": [
      "000001"
    ],
    "files": [
      {
        "rel_path": "chunks/000001",
        "size_bytes": 3964613
      },
      {
        "rel_path": "index",
        "size_bytes": 646029
      },
      {
        "rel_path": "meta.json"
      }
    ],
    "index_stats": {}
  }
}
01HT1PVSMCNYF8ZSDW53123NJX
{
  "ulid": "01HT1PVSMCNYF8ZSDW53123NJX",
  "minTime": 1711591200246,
  "maxTime": 1711598400000,
  "stats": {
    "numSamples": 2492640,
    "numSeries": 5193,
    "numChunks": 20772
  },
  "compaction": {
    "level": 1,
    "sources": [
      "01HT1PVSMCNYF8ZSDW53123NJX"
    ]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "cluster_name": "alpha",
      "cluster_node": "prometheus004",
      "datasource": "alpha-002"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "sidecar",
    "segment_files": [
      "000001"
    ],
    "files": [
      {
        "rel_path": "chunks/000001",
        "size_bytes": 3957077
      },
      {
        "rel_path": "index",
        "size_bytes": 644900
      },
      {
        "rel_path": "meta.json"
      }
    ],
    "index_stats": {}
  }
}
01HT1XQGXB5CHQB21YT5DNXFC8
{
  "ulid": "01HT1XQGXB5CHQB21YT5DNXFC8",
  "minTime": 1711598400246,
  "maxTime": 1711605600000,
  "stats": {
    "numSamples": 2492640,
    "numSeries": 5193,
    "numChunks": 20772
  },
  "compaction": {
    "level": 1,
    "sources": [
      "01HT1XQGXB5CHQB21YT5DNXFC8"
    ]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "cluster_name": "alpha",
      "cluster_node": "prometheus004",
      "datasource": "alpha-002"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "sidecar",
    "segment_files": [
      "000001"
    ],
    "files": [
      {
        "rel_path": "chunks/000001",
        "size_bytes": 3969637
      },
      {
        "rel_path": "index",
        "size_bytes": 645540
      },
      {
        "rel_path": "meta.json"
      }
    ],
    "index_stats": {}
  }
}
01HT24K86QTXJ1HV2NW252DAEV
{
  "ulid": "01HT24K86QTXJ1HV2NW252DAEV",
  "minTime": 1711605600246,
  "maxTime": 1711612800000,
  "stats": {
    "numSamples": 2492646,
    "numSeries": 5196,
    "numChunks": 20775
  },
  "compaction": {
    "level": 1,
    "sources": [
      "01HT24K86QTXJ1HV2NW252DAEV"
    ]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "cluster_name": "alpha",
      "cluster_node": "prometheus004",
      "datasource": "alpha-002"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "sidecar",
    "segment_files": [
      "000001"
    ],
    "files": [
      {
        "rel_path": "chunks/000001",
        "size_bytes": 3981026
      },
      {
        "rel_path": "index",
        "size_bytes": 645293
      },
      {
        "rel_path": "meta.json"
      }
    ],
    "index_stats": {}
  }
}

This is the command line that I'm using:

/bin/thanos compact \
  --bucket-web-label=cluster_node \
  --data-dir /var/thanos/compact \
  --objstore.config-file=/etc/thanos/objstore.yml \
  --wait \
  --selector.relabel-config-file=/etc/thanos/relabel_config.yml \
  --downsampling.disable \
  --retention.resolution-5m=1d \
  --retention.resolution-1h=1d \
  --log.format=json \
  --log.level=debug
Contents of /etc/thanos/objstore.yml
type: S3
config:
  bucket: "thanos-alpha"
  endpoint: "redacted"
  access_key: "redacted"
  insecure: false
  signature_version2: false
  secret_key: "redacted"
  list_objects_version: "v1"
  http_config:
    idle_conn_timeout: 60s
Contents of /etc/thanos/relabel_config.yml
- action: keep
  regex: "alpha-002"
  source_labels:
  - datasource

What could be the reason for this behavior?

vincent-olivert-riera avatar Apr 19 '24 02:04 vincent-olivert-riera

Is thanos_compact_iterations_total more than 0? :thinking:

GiedriusS avatar Apr 19 '24 05:04 GiedriusS

Is thanos_compact_iterations_total more than 0? 🤔

Yes, it is constantly growing.

This is how thanos_compact_todo_compactions compares with thanos_compact_iterations_total:

image

image

vincent-olivert-riera avatar Apr 19 '24 06:04 vincent-olivert-riera

@vincent-olivert-riera can you show us some information about the level 4 blocks you mentioned? What's their duration?

douglascamata avatar Apr 19 '24 10:04 douglascamata

@vincent-olivert-riera can you show us some information about the level 4 blocks you mentioned? What's their duration?

Sure.

image

This is its meta.json
{
  "ulid": "01HT1RN6JP9AZWYGHTG8XRXHSS",
  "minTime": 1710374400246,
  "maxTime": 1711584000000,
  "stats": {
    "numSamples": 418763800,
    "numSeries": 5231,
    "numChunks": 3489753
  },
  "compaction": {
    "level": 4,
    "sources": [
      "01HRXEDZ6XVH2MYV7K0W6CQ27Z",
      "01HRXN9PEVG5DWN2DA3H1EQZYW",
      "01HRXW5DPSMG8GMH0CNQZ08S79",
      "01HRY314YTX75WHERFSB1XFMQ2",
      "01HRY9WW6SCNHJBT3BWB46V9Y1",
      "01HRYGRKETBY322F7ABNHFGMNP",
      "01HRYQMAPTCKA67H1XT72KMA4E",
      "01HRYYG1YSWVX80H09XBE68MW4",
      "01HRZ5BS6TX5RANK4KVFE1A25K",
      "01HRZC7GET4BX2PCHXV0AXF0MX",
      "01HRZK37PT5KR89H78K70Z3FW2",
      "01HRZSYYYSEPRXQ3J1G99EMRN5",
      "01HS00TP6TF8HZ11SSBJ5DJTG2",
      "01HS07PDEX23P49VE9KN0Z6B6P",
      "01HS0EJ4PSQQR9HYQNKV7HP2XN",
      "01HS0NDVZX0XGC0K85FKS87HEE",
      "01HS0W9K6T60BN2G1HFWNZDYZC",
      "01HS135AEVCERKJHJAE7YE82PG",
      "01HS1A11PS6BF1W0WM0ZD1X1Y4",
      "01HS1GWRYTJE5BQQDVYQ99DFMB",
      "01HS1QRG6TZAZFAGBQ5DYTFV0H",
      "01HS1YM7ESF2Q93M625Q9SEJYX",
      "01HS25FYPTETJ90722E08R9PFS",
      "01HS2CBNYT6CM6BJ9AK2M8PGN6",
      "01HS2K7D6T8AW1R2VMRCPQ2272",
      "01HS2T34ETMF9BH086GDZEEQ2X",
      "01HS30YVPTN97BAVJ5BPP3854Y",
      "01HS37TJYTHEAMQ4H9JT792RJD",
      "01HS3EPA6TERX4QJTX09QD0PJJ",
      "01HS3NJ1ETC27EQGAB9E36SF2N",
      "01HS3WDRPT2BNNW2R2E3D6BT8X",
      "01HS439FYTNNAYWCN7FM561T3Z",
      "01HS4A576T2G0HS1HFRR55Z83H",
      "01HS4H0YETBSTJCR1VFA2KT4HM",
      "01HS4QWNPSQNT3SVHWB65TWHK0",
      "01HS4YRCYVQFRZZ88FQQKRX6SV",
      "01HS55M46TW584YFK9NYT2Y8J0",
      "01HS5CFVEVSRBDV7MK2SY3QJE7",
      "01HS5KBJPTDC3DHZ3G5DP4Y1XZ",
      "01HS5T79YTS4438E2ZX4FS4T5F",
      "01HS61316SK2JXN87693FRJ3D9",
      "01HS67YRET9XW7TJNJ5A2QSM41",
      "01HS6ETFPT53QCM7VYJZTH8QB8",
      "01HS6NP6YT5FMTPNY8D5N7C9BF",
      "01HS6WHY6T8PCT1GNAC3TVKEY1",
      "01HS73DNET028TXMPQYVVF179Q",
      "01HS7A9CPT1HTA26Q4YC1FAGHV",
      "01HS7H53YT05D4QETBPC042C7C",
      "01HS7R0V6TKWRR46E82709XQGQ",
      "01HS7YWJETVH0E1YV7KWVV6BH8",
      "01HS85R9PT52VPMPFP3B9D30YQ",
      "01HS8CM0YSBNC8E3X5Z1S1QAKY",
      "01HS8KFR6TT0C995731BTSSZ5C",
      "01HS8TBFET873G0CX47NYV5P07",
      "01HS9176PTW3XFMSYGWQKZZC6E",
      "01HS982XYT1JV16HZWKEX5N696",
      "01HS9EYN6TSV8J00BRNE4CD74H",
      "01HS9NTCEV1WCDGHNS5PJSK0NP",
      "01HS9WP3PS3B9NFFP98JRYTHJ4",
      "01HSA3HTYTQCCX7DH8EPDBN4Q0",
      "01HSAADJ6VB2YFJKY4RWY18ZA2",
      "01HSAH99EVKJQZMBSH7PF497SG",
      "01HSAR50PT0ZH3ZNE8N1VJWTXQ",
      "01HSAZ0QYXE5KPBH5NS0WFYHEF",
      "01HSB5WF6SPAB3TJP64V7NSME1",
      "01HSBCR6EVHNNNCJBN27H8RWF2",
      "01HSBKKXPTYZ5D4SH8P4KW74C9",
      "01HSBTFMYW431XKWR750PXYYAQ",
      "01HSC1BC7088CV86NBKTXXQ494",
      "01HSC873ET7YV5PK4EV61GKGD9",
      "01HSCF2TPSNKYMSTCF07FBTYHQ",
      "01HSCNYHYT6SVCYBF58KTZJQ9J",
      "01HSCWT96TDBGKXVZ1VR44X9DV",
      "01HSD3P0ETXPZ80M8EEZ61RE8H",
      "01HSDAHQPYG3XCFY91N41FR4A7",
      "01HSDHDEYT44RNSS14WYNVB9VS",
      "01HSDR966V0NK7E5CN8ED8RQJK",
      "01HSDZ4XEVH5C45F9FZK47TN59",
      "01HSE60MPT0N3CER5QERB3QBH1",
      "01HSECWBYS84V009FYSB3N6B39",
      "01HSEKR36TJCYV52XBSWFRDFW6",
      "01HSETKTETGNGNQBZYS4MSA7EP",
      "01HSF1FHPTVY7PGBHS0MHR0V4Z",
      "01HSF8B8YT8PMPF2YZ7WYX6DXA",
      "01HSFF706TDE9TJ45HVEJE1C5E",
      "01HSFP2QETJHV0QEZ70QBVE2Z4",
      "01HSFWYEPTVXBBVW872WYRQ18S",
      "01HSG3T5YVSTQ8SMZBEDACHG01",
      "01HSGANX6TNA9HNM3ZH3ZGRGRS",
      "01HSGHHMET3487PYA2BRJP80YC",
      "01HSGRDBPTQWZ1ZS64GGH6SZY5",
      "01HSGZ92YT3SNZSRC0M6GH56JN",
      "01HSH64T6TG6N29P8E8WACF9C3",
      "01HSHD0HET2HC2HP9TWRRHFEYH",
      "01HSHKW8PSF5SPN131PA2CHCYN",
      "01HSHTQZYTDZJ016DDXQZ9ZXQ6",
      "01HSJ1KQ6WMAABPCAD4QCZ30BP",
      "01HSJ8FEESK80Z3N9Z5D19841W",
      "01HSJFB5PT40EA8WMKFBWCWZ3X",
      "01HSJP6WYTZE8GE46P726YJVXK",
      "01HSJX2M6VXD0SF4YYKA920WY8",
      "01HSK3YBESNDWHZM80MBY4E4S0",
      "01HSKAT2PYPJXVRZG1NBGX88B0",
      "01HSKHNSYVZA9ZB9MZAKS7G5YP",
      "01HSKRHH6THAAP44ZDB80NGEFE",
      "01HSKZD8ET7W92EFFRP7BDMQR0",
      "01HSM68ZPXA3P18Y0DPZQJXH8N",
      "01HSMD4PYSWWKE2V6DPWFQ5VWA",
      "01HSMM0E6TA2EM8J40F7FR478S",
      "01HSMTW5ESJ3E2X9K3F8CDQJR5",
      "01HSN1QWPSWRBCKV88HVAH6CXW",
      "01HSN8KKYTYBPX1ZQC9BZ4Y4HJ",
      "01HSNFFB6THMDJ7X4FGYBZK8BD",
      "01HSNPB2EY6CXHMPWJH46T3S43",
      "01HSNX6SPV19FPNV99XT4N3BGE",
      "01HSP42GYS3GZPDEEJXEVTE5H5",
      "01HSPAY86SE3D504YN5357EEK2",
      "01HSPHSZEVQ92QFGH66YRM0W9D",
      "01HSPRNPPXG4PJJGBQEZJ0TK2E",
      "01HSPZHDYVDTWSRMDQPJEHR7VA",
      "01HSQ6D56YBN25SBSWJ12H7XCW",
      "01HSQD8WET5WDSJE28PHV491NW",
      "01HSQM4KPTTEXXA5P0JZJ8MHKS",
      "01HSQV0AYTDG339RRNFRR7H7VV",
      "01HSR1W26TJE71X3TGF056TP2S",
      "01HSR8QSET41K89H6GW418HC6X",
      "01HSRFKGPTY39Y12QYYX9RDN86",
      "01HSRPF7YSE9VPF3RQTPHAW7TZ",
      "01HSRXAZ6VYK26MJPCA1CSYSJS",
      "01HSS46PES01ZR3HJS0NQ4XSH8",
      "01HSSB2DPSPMRS3RY72KK7CEM3",
      "01HSSHY4YVT7JREWXH5NNTN14P",
      "01HSSRSW6V89AZKRRF317ZV2RS",
      "01HSSZNKET6RPQ9NH02128GSPH",
      "01HST6HAPV68TBB7GRPY9WEXGS",
      "01HSTDD1YSNXRE53KBARRATVNF",
      "01HSTM8S6V698S7JJ3EGK49AFH",
      "01HSTV4GET9ZQW2866AX8FEQ8F",
      "01HSV207PW7390V9E9J9BBZJYC",
      "01HSV8VYYTTAZAAYD5M5V93NQX",
      "01HSVFQP6V4HCN4WF95QWGVAN7",
      "01HSVPKDETASJTW0BAAJB5VB9M",
      "01HSVXF4PWVT6Q68BN4B0KXA4A",
      "01HSW4AVYTN03K408NBNQ9B7QZ",
      "01HSWB6K6TFQJPSTEDR4Z94KNF",
      "01HSWJ2AET9Q65CZ0ZGPEC69YW",
      "01HSWRY1PVR3FH3GBJN4ANA8G6",
      "01HSWZSRYVX0HNXA527GK123SH",
      "01HSX6NG6WB00R3GRJKE5QSRA4",
      "01HSXDH7EVC955BNRS0KY1R130",
      "01HSXMCYPSNVZ4SMW2MQPSY2Z8",
      "01HSXV8NYTKHZ93211PK4WCK5H",
      "01HSY24D6TRMPDSHG32KNWKVH8",
      "01HSY904ETZ47RVJ3JK0KAFB37",
      "01HSYFVVPTN80CZDQ38HT26RQJ",
      "01HSYPQJYTP7E8E6SGWBHE0SPP",
      "01HSYXKA6TPPMRJW7D1W8WYYNZ",
      "01HSZ4F1ET71NACD56BAM6RNAP",
      "01HSZBARPT7FFE4X7CA2KKTYS9",
      "01HSZJ6FYT8JTW9KSMNG6YEGZ9",
      "01HSZS276TA2KNDMBDXA25RCG6",
      "01HSZZXYEWEGBJTTBWHHQGYYBA",
      "01HT06SNPTV1CKM0NHRS32PAQR",
      "01HT0DNCYS66962KGAWWSGZS9V",
      "01HT0MH46TP81CRNFCCCHRF19J",
      "01HT0VCVETBHMAJM9SVQSX0EM6",
      "01HT128JPTHQ3Q2YVF0RTB5ER5",
      "01HT1949YT230P1R17F1HSCYER"
    ],
    "parents": [
      {
        "ulid": "01HS2VVDW0R0EVGNTND8E2BCTM",
        "minTime": 1710374400246,
        "maxTime": 1710547200000
      },
      {
        "ulid": "01HS80MPGZPTTEY3JBQ3RKV5F3",
        "minTime": 1710547200246,
        "maxTime": 1710720000000
      },
      {
        "ulid": "01HSD5E4H5NJH60BH30PKA3186",
        "minTime": 1710720000246,
        "maxTime": 1710892800000
      },
      {
        "ulid": "01HSJA7FHRDBYF0XFDEYXVMY0Z",
        "minTime": 1710892800246,
        "maxTime": 1711065600000
      },
      {
        "ulid": "01HSQF11TV1XEPV3ZN7WWX66HA",
        "minTime": 1711065600246,
        "maxTime": 1711238400000
      },
      {
        "ulid": "01HSWKTK77ZK2EM1H2EWGWCYNS",
        "minTime": 1711238400246,
        "maxTime": 1711411200000
      },
      {
        "ulid": "01HT1RKXJ3KFABZF5C1V8F7JJZ",
        "minTime": 1711411200246,
        "maxTime": 1711584000000
      }
    ]
  },
  "version": 1,
  "thanos": {
    "labels": {
      "cluster_name": "alpha",
      "cluster_node": "prometheus003-prom-jp2v-dev",
      "datasource": "alpha-002"
    },
    "downsample": {
      "resolution": 0
    },
    "source": "compactor",
    "segment_files": [
      "000001",
      "000002"
    ],
    "files": [
      {
        "rel_path": "chunks/000001",
        "size_bytes": 536870124
      },
      {
        "rel_path": "chunks/000002",
        "size_bytes": 125027769
      },
      {
        "rel_path": "index",
        "size_bytes": 22741614
      },
      {
        "rel_path": "meta.json"
      }
    ],
    "index_stats": {
      "series_max_size": 4800,
      "chunk_max_size": 1013
    }
  }
}

vincent-olivert-riera avatar Apr 19 '24 10:04 vincent-olivert-riera

@vincent-olivert-riera if you grep your Compactor's log with block IDs of the blocks that didn't get compacted, do you see anything that stands out? If possible, maybe increase the Compactor's log level to generate more logs (then revert it, otherwise logs might be too spammy). 🤔

douglascamata avatar Apr 19 '24 11:04 douglascamata

@douglascamata , I haven't increased the Compactor's log level yet, but this is what the Compactor is doing (in a loop):

Apr 19, 2024 @ 20:29:37.120{"caller":"compact.go:1478","level":"info","msg":"compaction iterations done","ts":"2024-04-19T11:29:29.342094884Z"}
Apr 19, 2024 @ 20:29:37.120{"caller":"compact.go:457","level":"info","msg":"downsampling was explicitly disabled","ts":"2024-04-19T11:29:29.342370667Z"}
Apr 19, 2024 @ 20:27:42.421{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"8.78195813s","duration_ms":8781,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":346,"ts":"2024-04-19T11:26:37.988636543Z"}
Apr 19, 2024 @ 20:27:42.421{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:27:29.206720687Z"}
Apr 19, 2024 @ 20:27:42.421{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"7.076132358s","duration_ms":7076,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":346,"ts":"2024-04-19T11:27:36.282764217Z"}
Apr 19, 2024 @ 20:26:36.158{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:25:29.206734945Z"}
Apr 19, 2024 @ 20:26:36.158{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"7.2963137s","duration_ms":7296,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":346,"ts":"2024-04-19T11:25:36.502939791Z"}
Apr 19, 2024 @ 20:26:36.158{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:26:29.20683454Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"compact.go:1414","level":"info","msg":"start sync of metas","ts":"2024-04-19T11:24:22.419242154Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"fetcher.go:317","component":"block.BaseFetcher","concurrency":32,"level":"debug","msg":"fetching meta data","ts":"2024-04-19T11:24:22.419842845Z"}
Apr 19, 2024 @ 20:25:37.421{"cached":346,"caller":"fetcher.go:487","component":"block.BaseFetcher","duration":"5.786435988s","duration_ms":5786,"level":"info","msg":"successfully synchronized block metadata","partial":0,"returned":174,"ts":"2024-04-19T11:24:28.206118667Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"compact.go:1419","level":"info","msg":"start of GC","ts":"2024-04-19T11:24:28.20786563Z"}
Apr 19, 2024 @ 20:25:37.421{"caller":"compact.go:1442","level":"info","msg":"start of compactions","ts":"2024-04-19T11:24:28.208735693Z"}

I have search for all the block IDs, but Kibana does not return anything at all. ~I will try to increase the log level and see what happens.~ The log level is debug.

vincent-olivert-riera avatar Apr 19 '24 11:04 vincent-olivert-riera