thanos icon indicating copy to clipboard operation
thanos copied to clipboard

tools bucket rewrite: invalid memory address or nil pointer dereference

Open onelife opened this issue 3 years ago • 23 comments

I'm trying the series deletion feature and got the following error.

level=info ts=2021-06-03T04:34:56.077967157Z caller=factory.go:46 msg="loading bucket configuration"
level=info ts=2021-06-03T04:34:56.116875761Z caller=tools_bucket.go:868 msg="downloading block" source=01F6NY6XFBHZSQ159ZYF5FGE61
level=info ts=2021-06-03T04:34:59.814395054Z caller=tools_bucket.go:904 msg="changelog will be available" file=/tmp/thanos-rewrite/01F782EC369F5RPSZQSAZ45CQ5/change.log
level=info ts=2021-06-03T04:34:59.831787434Z caller=tools_bucket.go:919 msg="starting rewrite for block" source=01F6NY6XFBHZSQ159ZYF5FGE61 new=01F782EC369F5RPSZQSAZ45CQ5 toDelete="- matchers: \"{__name__=~\\\"mqtt2tsdb_.*\\\",gateway=\\\"LG01010002012110100\\\"}\""
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x17aba71]

goroutine 101 [running]:
github.com/thanos-io/thanos/pkg/compactv2.(*lazyPopulatableChunk).Bytes(0xc0000a6240, 0x8, 0xc0000d91c8, 0x40db9b)
        /app/pkg/compactv2/chunk_series_set.go:119 +0x31
github.com/prometheus/prometheus/tsdb/chunks.(*Writer).WriteChunks(0xc0000d0960, 0xc0000cc140, 0x5, 0x8, 0xa0, 0xc0000cc140)
        /go/pkg/mod/github.com/prometheus/[email protected]/tsdb/chunks/chunks.go:302 +0x11a
github.com/thanos-io/thanos/pkg/block.(*statsGatheringSeriesWriter).WriteChunks(0xc00003afc0, 0xc0000cc140, 0x5, 0x8, 0x0, 0x0)
        /app/pkg/block/writer.go:172 +0x5f
github.com/thanos-io/thanos/pkg/compactv2.(*Compactor).write(0xc0000d9d68, 0x1fa0228, 0xc000862c00, 0x1f96f50, 0xc0000d0000, 0x1fa0f80, 0xc0000d0050, 0x7f74fb69eff0, 0xc00003afc0, 0x1f6ff20, ...)
        /app/pkg/compactv2/chunk_series_set.go:200 +0x427
github.com/thanos-io/thanos/pkg/compactv2.(*Compactor).WriteSeries(0xc0000d9d68, 0x1fa0228, 0xc000862c00, 0xc0000d9b98, 0x1, 0x1, 0x1fa67b8, 0xc00003afc0, 0x1f6ff20, 0xc000404a80, ...)
        /app/pkg/compactv2/compactor.go:147 +0xb25
main.registerBucketRewrite.func1.1(0x0, 0x0)
        /app/cmd/thanos/tools_bucket.go:920 +0x10f5
github.com/oklog/run.(*Group).Run.func1(0xc0002924e0, 0xc000890a00, 0xc000881b60)
        /go/pkg/mod/github.com/oklog/[email protected]/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /go/pkg/mod/github.com/oklog/[email protected]/group.go:37 +0xbb

Dry run has no issue. The test is done by thanos v0.20.2 and S3. Following is the output of tools bucket inspect.

level=info ts=2021-06-03T04:05:41.6528167Z caller=factory.go:46 msg="loading bucket configuration"
level=info ts=2021-06-03T04:05:42.1304465Z caller=fetcher.go:476 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=461.8639ms cached=187 returned=187 partial=0
|            ULID            |        FROM         |        UNTIL        |     RANGE      |   UNTIL-DOWN    | #SERIES |   #SAMPLES    |  #CHUNKS   | COMP-LEVEL | COMP-FAILED |                                                        LABELS                                                        | RESOLUTION |     SOURCE     |
|----------------------------|---------------------|---------------------|----------------|-----------------|---------|---------------|------------|------------|-------------|----------------------------------------------------------------------------------------------------------------------|------------|----------------|
| 01F11XK5AW6BN0V760D9C4Y5T3 | 04-03-2021 00:00:00 | 18-03-2021 00:00:00 | 335h59m59.864s | -               | 86,658  | 22,437,633    | 286,768    | 4          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 1h0m0s     | compactor      |
| 01F25Z5EGMWKM577FMTYM0718C | 18-03-2021 00:00:00 | 01-04-2021 00:00:00 | 335h59m59.935s | -               | 91,515  | 23,312,170    | 292,946    | 4          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 1h0m0s     | compactor      |
| 01F3A0S4HM9KYV9J5528SM7NS1 | 01-04-2021 00:00:00 | 15-04-2021 00:00:00 | 335h59m59.938s | -               | 106,308 | 25,558,975    | 321,134    | 4          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 1h0m0s     | compactor      |
| 01F4E20BTK668NA3HBNRXN7AKB | 15-04-2021 00:00:00 | 29-04-2021 00:00:00 | 335h59m59.943s | -295h59m59.943s | 100,537 | 3,289,387,764 | 27,425,425 | 4          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 0s         | compactor      |
| 01F4E24RDA6E8WA289YC2KVN8Z | 15-04-2021 00:00:00 | 29-04-2021 00:00:00 | 335h59m59.943s | -95h59m59.943s  | 100,537 | 326,586,223   | 2,366,044  | 4          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s       | compactor      |
| 01F4E2BZ0Y3XADYFHXEPE0VCN7 | 15-04-2021 00:00:00 | 29-04-2021 00:00:00 | 335h59m59.943s | -               | 100,537 | 27,251,580    | 342,084    | 4          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 1h0m0s     | compactor      |
| 01F5J3JRDFXSFVKJNE3T3JWTW0 | 29-04-2021 00:00:00 | 13-05-2021 00:00:00 | 335h59m59.896s | -295h59m59.896s | 162,678 | 3,312,859,543 | 27,657,855 | 4          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 0s         | compactor      |
| 01F5J3QKAMVCD9VGRXD1AHK8DS | 29-04-2021 00:00:00 | 13-05-2021 00:00:00 | 335h59m59.896s | -95h59m59.896s  | 162,678 | 330,299,454   | 2,463,624  | 4          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s       | compactor      |
| 01F5J3ZA3JJZ0H7K4G0N0DCYBG | 29-04-2021 00:00:00 | 13-05-2021 00:00:00 | 335h59m59.896s | -               | 162,678 | 27,804,831    | 406,632    | 4          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 1h0m0s     | compactor      |
| 01F5Q8ANWKCJQA7FE34JW6MPT5 | 13-05-2021 00:00:00 | 15-05-2021 00:00:00 | 47h59m59.942s  | 192h0m0.058s    | 84,767  | 41,735,306    | 370,524    | 3          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s       | compactor      |
| 01F6S0DGVDNJA6FQE3JBXFQ7ZS | 13-05-2021 00:00:00 | 15-05-2021 00:00:00 | 47h59m59.942s  | -7h59m59.942s   | 84,736  | 418,305,020   | 3,489,467  | 3          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 0s         | bucket.rewrite |
| 01F6P52S252GDCNEE6MS6FT1FP | 13-05-2021 00:00:00 | 27-05-2021 00:00:00 | 335h59m59.942s | -295h59m59.942s | 384,795 | 3,841,904,413 | 32,154,121 | 4          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 0s         | compactor      |
| 01F5WD45R5QXQM0HBT98D2Z2AE | 15-05-2021 00:00:00 | 17-05-2021 00:00:00 | 47h59m59.865s  | 192h0m0.135s    | 76,958  | 44,303,448    | 384,616    | 3          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s       | compactor      |
| 01F61RSC2JKFKAPKSEHVYB0839 | 17-05-2021 00:00:00 | 19-05-2021 00:00:00 | 47h59m59.97s   | 192h0m0.03s     | 118,707 | 45,051,844    | 412,647    | 3          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s       | compactor      |
| 01F66PPS2E60WH5JTQ0FDFKP7H | 19-05-2021 02:51:34 | 21-05-2021 00:00:00 | 45h8m25.33s    | 194h51m34.67s   | 320,499 | 55,308,708    | 576,107    | 3          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s       | compactor      |
| 01F6BMMC13E9Y97Z9MKDPAK9H5 | 21-05-2021 00:00:00 | 23-05-2021 00:00:00 | 48h0m0s        | 192h0m0s        | 114,131 | 65,161,317    | 565,736    | 3          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s       | compactor      |
| 01F6H092T470QK29QPJFRRXYTG | 23-05-2021 00:00:00 | 25-05-2021 00:00:00 | 47h59m59.995s  | 192h0m0.005s    | 115,335 | 65,352,232    | 568,464    | 3          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s       | compactor      |
| 01F6NY6XFBHZSQ159ZYF5FGE61 | 25-05-2021 00:00:00 | 27-05-2021 00:00:00 | 48h0m0s        | 192h0m0s        | 121,740 | 65,901,813    | 577,159    | 3          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s       | compactor      |
| 01F6Q0FVT6X0HYATRCS7DY878M | 27-05-2021 00:00:00 | 27-05-2021 08:00:00 | 7h59m59.989s   | 32h0m0.011s     | 116,982 | 110,876,255   | 925,500    | 2          | false       | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 0s         | compactor      |
...

I tried several (but not all) blocks (source==compactor) and only rewriting block 01F6Q0FVT6X0HYATRCS7DY878M has no error.

onelife avatar Jun 03 '21 05:06 onelife

Thanks for reporting this issue. Bucket rewrite tool only works for not downsampled blocks (res=0) currently.

We need to mention it in the docs.

yeya24 avatar Jun 03 '21 05:06 yeya24

Hi @yeya24, thanks for the reply!

Before close the issue, may I know the roadmap of this feature? Any plan to support downsampled blocks? Any plan to support deleting series within specified time range?

onelife avatar Jun 04 '21 01:06 onelife

@bwplotka for more input. We definitely want to support it. But it sounds tricky to me to support deletion as you can delete only part of a series by given time ranges.

For the new rewrite relabel cmd, this is easier to do as it works for the whole series.

yeya24 avatar Jun 04 '21 02:06 yeya24

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] avatar Aug 04 '21 22:08 stale[bot]

Not stale.

yeya24 avatar Aug 13 '21 05:08 yeya24

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] avatar Oct 12 '21 21:10 stale[bot]

Still needed.

markmsmith avatar Oct 16 '21 21:10 markmsmith

If it is not possible to remove metrics from downsampled data, is it even possible to recreate downsampled data from raw metrics where unwanted metrics are deleted?

bobykus31 avatar Dec 14 '21 14:12 bobykus31

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] avatar Mar 02 '22 15:03 stale[bot]

As far as I know, this is still needed.

markmsmith avatar Mar 07 '22 16:03 markmsmith

Confirming, still needed, just ran into the issue :)

mortaelth avatar Apr 05 '22 08:04 mortaelth

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] avatar Jun 12 '22 17:06 stale[bot]

Still there :(

mortaelth avatar Jun 13 '22 19:06 mortaelth

Regarding @bobykus31 question

If it is not possible to remove metrics from downsampled data, is it even possible to recreate downsampled data from raw metrics where unwanted metrics are deleted?

Although this might not be efficient it might be a workaround. How does compactor handle such a rewritten block ? Will it just be processed because it is a new block from the compactors view ? Will it be ignored ?

Maybe @yeya24 can answer this ?

iceman91176 avatar Jun 17 '22 07:06 iceman91176

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] avatar Sep 21 '22 02:09 stale[bot]

I think this is still needed

markmsmith avatar Sep 22 '22 22:09 markmsmith

Faced same issue. Is there any plan to fix it?

mo4islona avatar Nov 15 '22 18:11 mo4islona

There's still some WIP to do this, so not forgotten, see this recent closed PR: https://github.com/thanos-io/thanos/pull/5725

matej-g avatar Nov 16 '22 11:11 matej-g

Any update on this issue? Still needed.

lasermoth avatar Mar 30 '23 04:03 lasermoth

still there:

level=info ts=2023-04-19T13:44:27.718117168Z caller=tools_bucket.go:1227 msg="starting rewrite for block" source=xxxxxxxxxxx new=01GYCW4T9RDJB2J8SVHTKHAS43 toDelete="- matchers: '{__name__=\"container_memory_usage_bytes\", cluster=\"live-k8s\", service=\"kube-test-stack-kubelet\"}'\n\n" toRelabel=
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1ecbb32]

goroutine 113 [running]:
github.com/thanos-io/thanos/pkg/compactv2.(*lazyPopulatableChunk).Bytes(0xc000815470?)
	/app/pkg/compactv2/chunk_series_set.go:126 +0x32
github.com/prometheus/prometheus/tsdb/chunks.(*Writer).WriteChunks(0xc000a6aff0, {0xc00023e480, 0x1, 0x1})

BouchaaraAdil avatar Apr 19 '23 13:04 BouchaaraAdil

Do i understand it right that in order to delete a metric:

  • I have to run the "tools bucket rewrite" command
    • Ignore all errors and wait until the raw blocks are rewritten
  • then I have to recreate the downsampled metric / blocks (let the compactor do his work)
  • wait X days until the compactor delete the old / wrong blocks

This then should shrink my s3 costs - correct?

paprickar avatar Jul 04 '23 20:07 paprickar

@yeya24 You mentioned in https://github.com/thanos-io/thanos/pull/5725#issuecomment-1262465994 that you had an old branch to handle downsampled blocks with bucket rewrite. Is that still valid, or would implementing this need to be re-visited ?

lasermoth avatar Aug 08 '23 04:08 lasermoth

Issue still sadly happening. Same error when dealing with downsampled data. Perhaps a simple error message for the time being that says so instead of segfaulting?

W-Hamra avatar May 20 '24 00:05 W-Hamra