thanos
thanos copied to clipboard
tools bucket rewrite: invalid memory address or nil pointer dereference
I'm trying the series deletion feature and got the following error.
level=info ts=2021-06-03T04:34:56.077967157Z caller=factory.go:46 msg="loading bucket configuration"
level=info ts=2021-06-03T04:34:56.116875761Z caller=tools_bucket.go:868 msg="downloading block" source=01F6NY6XFBHZSQ159ZYF5FGE61
level=info ts=2021-06-03T04:34:59.814395054Z caller=tools_bucket.go:904 msg="changelog will be available" file=/tmp/thanos-rewrite/01F782EC369F5RPSZQSAZ45CQ5/change.log
level=info ts=2021-06-03T04:34:59.831787434Z caller=tools_bucket.go:919 msg="starting rewrite for block" source=01F6NY6XFBHZSQ159ZYF5FGE61 new=01F782EC369F5RPSZQSAZ45CQ5 toDelete="- matchers: \"{__name__=~\\\"mqtt2tsdb_.*\\\",gateway=\\\"LG01010002012110100\\\"}\""
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x17aba71]
goroutine 101 [running]:
github.com/thanos-io/thanos/pkg/compactv2.(*lazyPopulatableChunk).Bytes(0xc0000a6240, 0x8, 0xc0000d91c8, 0x40db9b)
/app/pkg/compactv2/chunk_series_set.go:119 +0x31
github.com/prometheus/prometheus/tsdb/chunks.(*Writer).WriteChunks(0xc0000d0960, 0xc0000cc140, 0x5, 0x8, 0xa0, 0xc0000cc140)
/go/pkg/mod/github.com/prometheus/[email protected]/tsdb/chunks/chunks.go:302 +0x11a
github.com/thanos-io/thanos/pkg/block.(*statsGatheringSeriesWriter).WriteChunks(0xc00003afc0, 0xc0000cc140, 0x5, 0x8, 0x0, 0x0)
/app/pkg/block/writer.go:172 +0x5f
github.com/thanos-io/thanos/pkg/compactv2.(*Compactor).write(0xc0000d9d68, 0x1fa0228, 0xc000862c00, 0x1f96f50, 0xc0000d0000, 0x1fa0f80, 0xc0000d0050, 0x7f74fb69eff0, 0xc00003afc0, 0x1f6ff20, ...)
/app/pkg/compactv2/chunk_series_set.go:200 +0x427
github.com/thanos-io/thanos/pkg/compactv2.(*Compactor).WriteSeries(0xc0000d9d68, 0x1fa0228, 0xc000862c00, 0xc0000d9b98, 0x1, 0x1, 0x1fa67b8, 0xc00003afc0, 0x1f6ff20, 0xc000404a80, ...)
/app/pkg/compactv2/compactor.go:147 +0xb25
main.registerBucketRewrite.func1.1(0x0, 0x0)
/app/cmd/thanos/tools_bucket.go:920 +0x10f5
github.com/oklog/run.(*Group).Run.func1(0xc0002924e0, 0xc000890a00, 0xc000881b60)
/go/pkg/mod/github.com/oklog/[email protected]/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
/go/pkg/mod/github.com/oklog/[email protected]/group.go:37 +0xbb
Dry run has no issue. The test is done by thanos v0.20.2
and S3.
Following is the output of tools bucket inspect
.
level=info ts=2021-06-03T04:05:41.6528167Z caller=factory.go:46 msg="loading bucket configuration"
level=info ts=2021-06-03T04:05:42.1304465Z caller=fetcher.go:476 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=461.8639ms cached=187 returned=187 partial=0
| ULID | FROM | UNTIL | RANGE | UNTIL-DOWN | #SERIES | #SAMPLES | #CHUNKS | COMP-LEVEL | COMP-FAILED | LABELS | RESOLUTION | SOURCE |
|----------------------------|---------------------|---------------------|----------------|-----------------|---------|---------------|------------|------------|-------------|----------------------------------------------------------------------------------------------------------------------|------------|----------------|
| 01F11XK5AW6BN0V760D9C4Y5T3 | 04-03-2021 00:00:00 | 18-03-2021 00:00:00 | 335h59m59.864s | - | 86,658 | 22,437,633 | 286,768 | 4 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 1h0m0s | compactor |
| 01F25Z5EGMWKM577FMTYM0718C | 18-03-2021 00:00:00 | 01-04-2021 00:00:00 | 335h59m59.935s | - | 91,515 | 23,312,170 | 292,946 | 4 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 1h0m0s | compactor |
| 01F3A0S4HM9KYV9J5528SM7NS1 | 01-04-2021 00:00:00 | 15-04-2021 00:00:00 | 335h59m59.938s | - | 106,308 | 25,558,975 | 321,134 | 4 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 1h0m0s | compactor |
| 01F4E20BTK668NA3HBNRXN7AKB | 15-04-2021 00:00:00 | 29-04-2021 00:00:00 | 335h59m59.943s | -295h59m59.943s | 100,537 | 3,289,387,764 | 27,425,425 | 4 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 0s | compactor |
| 01F4E24RDA6E8WA289YC2KVN8Z | 15-04-2021 00:00:00 | 29-04-2021 00:00:00 | 335h59m59.943s | -95h59m59.943s | 100,537 | 326,586,223 | 2,366,044 | 4 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s | compactor |
| 01F4E2BZ0Y3XADYFHXEPE0VCN7 | 15-04-2021 00:00:00 | 29-04-2021 00:00:00 | 335h59m59.943s | - | 100,537 | 27,251,580 | 342,084 | 4 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 1h0m0s | compactor |
| 01F5J3JRDFXSFVKJNE3T3JWTW0 | 29-04-2021 00:00:00 | 13-05-2021 00:00:00 | 335h59m59.896s | -295h59m59.896s | 162,678 | 3,312,859,543 | 27,657,855 | 4 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 0s | compactor |
| 01F5J3QKAMVCD9VGRXD1AHK8DS | 29-04-2021 00:00:00 | 13-05-2021 00:00:00 | 335h59m59.896s | -95h59m59.896s | 162,678 | 330,299,454 | 2,463,624 | 4 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s | compactor |
| 01F5J3ZA3JJZ0H7K4G0N0DCYBG | 29-04-2021 00:00:00 | 13-05-2021 00:00:00 | 335h59m59.896s | - | 162,678 | 27,804,831 | 406,632 | 4 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 1h0m0s | compactor |
| 01F5Q8ANWKCJQA7FE34JW6MPT5 | 13-05-2021 00:00:00 | 15-05-2021 00:00:00 | 47h59m59.942s | 192h0m0.058s | 84,767 | 41,735,306 | 370,524 | 3 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s | compactor |
| 01F6S0DGVDNJA6FQE3JBXFQ7ZS | 13-05-2021 00:00:00 | 15-05-2021 00:00:00 | 47h59m59.942s | -7h59m59.942s | 84,736 | 418,305,020 | 3,489,467 | 3 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 0s | bucket.rewrite |
| 01F6P52S252GDCNEE6MS6FT1FP | 13-05-2021 00:00:00 | 27-05-2021 00:00:00 | 335h59m59.942s | -295h59m59.942s | 384,795 | 3,841,904,413 | 32,154,121 | 4 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 0s | compactor |
| 01F5WD45R5QXQM0HBT98D2Z2AE | 15-05-2021 00:00:00 | 17-05-2021 00:00:00 | 47h59m59.865s | 192h0m0.135s | 76,958 | 44,303,448 | 384,616 | 3 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s | compactor |
| 01F61RSC2JKFKAPKSEHVYB0839 | 17-05-2021 00:00:00 | 19-05-2021 00:00:00 | 47h59m59.97s | 192h0m0.03s | 118,707 | 45,051,844 | 412,647 | 3 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s | compactor |
| 01F66PPS2E60WH5JTQ0FDFKP7H | 19-05-2021 02:51:34 | 21-05-2021 00:00:00 | 45h8m25.33s | 194h51m34.67s | 320,499 | 55,308,708 | 576,107 | 3 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s | compactor |
| 01F6BMMC13E9Y97Z9MKDPAK9H5 | 21-05-2021 00:00:00 | 23-05-2021 00:00:00 | 48h0m0s | 192h0m0s | 114,131 | 65,161,317 | 565,736 | 3 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s | compactor |
| 01F6H092T470QK29QPJFRRXYTG | 23-05-2021 00:00:00 | 25-05-2021 00:00:00 | 47h59m59.995s | 192h0m0.005s | 115,335 | 65,352,232 | 568,464 | 3 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s | compactor |
| 01F6NY6XFBHZSQ159ZYF5FGE61 | 25-05-2021 00:00:00 | 27-05-2021 00:00:00 | 48h0m0s | 192h0m0s | 121,740 | 65,901,813 | 577,159 | 3 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 5m0s | compactor |
| 01F6Q0FVT6X0HYATRCS7DY878M | 27-05-2021 00:00:00 | 27-05-2021 08:00:00 | 7h59m59.989s | 32h0m0.011s | 116,982 | 110,876,255 | 925,500 | 2 | false | prometheus=backend/kprom-kube-prometheus-prometheus,prometheus_replica=prometheus-kprom-kube-prometheus-prometheus-0 | 0s | compactor |
...
I tried several (but not all) blocks (source==compactor) and only rewriting block 01F6Q0FVT6X0HYATRCS7DY878M
has no error.
Thanks for reporting this issue. Bucket rewrite tool only works for not downsampled blocks (res=0) currently.
We need to mention it in the docs.
Hi @yeya24, thanks for the reply!
Before close the issue, may I know the roadmap of this feature? Any plan to support downsampled blocks? Any plan to support deleting series within specified time range?
@bwplotka for more input. We definitely want to support it. But it sounds tricky to me to support deletion as you can delete only part of a series by given time ranges.
For the new rewrite relabel cmd, this is easier to do as it works for the whole series.
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Not stale.
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Still needed.
If it is not possible to remove metrics from downsampled data, is it even possible to recreate downsampled data from raw metrics where unwanted metrics are deleted?
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
As far as I know, this is still needed.
Confirming, still needed, just ran into the issue :)
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Still there :(
Regarding @bobykus31 question
If it is not possible to remove metrics from downsampled data, is it even possible to recreate downsampled data from raw metrics where unwanted metrics are deleted?
Although this might not be efficient it might be a workaround. How does compactor handle such a rewritten block ? Will it just be processed because it is a new block from the compactors view ? Will it be ignored ?
Maybe @yeya24 can answer this ?
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
I think this is still needed
Faced same issue. Is there any plan to fix it?
There's still some WIP to do this, so not forgotten, see this recent closed PR: https://github.com/thanos-io/thanos/pull/5725
Any update on this issue? Still needed.
still there:
level=info ts=2023-04-19T13:44:27.718117168Z caller=tools_bucket.go:1227 msg="starting rewrite for block" source=xxxxxxxxxxx new=01GYCW4T9RDJB2J8SVHTKHAS43 toDelete="- matchers: '{__name__=\"container_memory_usage_bytes\", cluster=\"live-k8s\", service=\"kube-test-stack-kubelet\"}'\n\n" toRelabel=
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1ecbb32]
goroutine 113 [running]:
github.com/thanos-io/thanos/pkg/compactv2.(*lazyPopulatableChunk).Bytes(0xc000815470?)
/app/pkg/compactv2/chunk_series_set.go:126 +0x32
github.com/prometheus/prometheus/tsdb/chunks.(*Writer).WriteChunks(0xc000a6aff0, {0xc00023e480, 0x1, 0x1})
Do i understand it right that in order to delete a metric:
- I have to run the "tools bucket rewrite" command
- Ignore all errors and wait until the raw blocks are rewritten
- then I have to recreate the downsampled metric / blocks (let the compactor do his work)
- wait X days until the compactor delete the old / wrong blocks
This then should shrink my s3 costs - correct?
@yeya24 You mentioned in https://github.com/thanos-io/thanos/pull/5725#issuecomment-1262465994 that you had an old branch to handle downsampled blocks with bucket rewrite. Is that still valid, or would implementing this need to be re-visited ?
Issue still sadly happening. Same error when dealing with downsampled data. Perhaps a simple error message for the time being that says so instead of segfaulting?