VictoriaMetrics icon indicating copy to clipboard operation
VictoriaMetrics copied to clipboard

vmagent panic on remoteWrite.streamAggr.dedupInterval

Open alexintech opened this issue 9 months ago • 8 comments

Describe the bug

vmagent crashes periodically when the -remoteWrite.streamAggr.dedupInterval="0s,120s" flag set.

To Reproduce

vmagent configured with remoteWrite.streamAggr.dedupInterval configuration:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
  name: vmagent-multi-retention
  namespace: victoria-metrics
spec:
  image:
    tag: v1.101.0
  selectAllByDefault: true
  replicaCount: 1
  scrapeInterval: 20s
  scrapeTimeout: 10s
  externalLabels:
    cluster: mycluster
  extraArgs:
    promscrape.streamParse: 'true'
    remoteWrite.streamAggr.dedupInterval: "0s,120s"
  statefulMode: true
  statefulStorage:
    volumeClaimTemplate:
      spec:
        resources:
          requests:
            storage: 20Gi
  remoteWrite:
    - url: "http://vminsert-vmcluster-retention-1m.victoria-metrics.svc:8480/insert/0/prometheus/api/v1/write"
    - url: "http://vminsert-vmcluster-retention-3m.victoria-metrics.svc:8480/insert/0/prometheus/api/v1/write"

Version

./vmagent-prod --version vmagent-20240425-145801-tags-v1.101.0-0-g5334f0c2c

Logs

panic: runtime error: index out of range [6] with length 0

goroutine 15146 [running]:
github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite.(*writeRequest).copyTimeSeries(0xc000000008, 0xc004a236e0, 0xc000a796e8)
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite/pendingseries.go:207 +0x6a9
github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite.(*writeRequest).tryPush(0xc000000008, {0xc000a72008, 0x283, 0xc0004f8820?})
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite/pendingseries.go:192 +0x6d
github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite.(*pendingSeries).TryPush(0xc000000000, {0xc000a72008?, 0x40c025?, 0x10?})
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite/pendingseries.go:64 +0x67
github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite.(*remoteWriteCtx).tryPushInternal(0x8?, {0xc000a72008?, 0x0?, 0xc00013c510?})
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite/remotewrite.go:1015 +0x1c5
github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite.(*remoteWriteCtx).TryPush(0xc000099b60, {0xc000a72008?, 0x10a20?, 0xc0000a3950?})
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite/remotewrite.go:957 +0x605
github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite.tryPushBlockToRemoteStorages.func1(0xc00117aeac?)
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite/remotewrite.go:593 +0x65
created by github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite.tryPushBlockToRemoteStorages in goroutine 49
	github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite/remotewrite.go:591 +0xea

Screenshots

No response

Used command-line flags

command-line flags -httpListenAddr=":8429" -promscrape.config="/etc/vmagent/config_out/vmagent.env.yaml" -promscrape.streamParse="true" -remoteWrite.maxDiskUsagePerURL="1073741824" -remoteWrite.streamAggr.dedupInterval="0s,2m0s" -remoteWrite.tmpDataPath="/vmagent_pq/vmagent-remotewrite-data" -remoteWrite.url="secret"

Additional information

No response

alexintech avatar Apr 29 '24 11:04 alexintech

Thanks for report! This looks like race condition. @AndrewChubatiuk would you mind taking a look?

hagen1778 avatar Apr 29 '24 12:04 hagen1778

It only happens when you have multiple remotewrite targets with:

  1. some of them runs with deduplicator.
  2. others don't.

The remotewrite (with deduplicator) Push data here: https://github.com/VictoriaMetrics/VictoriaMetrics/blob/5334f0c2ce91d975d22012546d882917c0ff5fcf/app/vmagent/remotewrite/remotewrite.go#L951 And clear(tss)

While the remotewrite (without deduplicator) Push data here: https://github.com/VictoriaMetrics/VictoriaMetrics/blob/5334f0c2ce91d975d22012546d882917c0ff5fcf/app/vmagent/remotewrite/remotewrite.go#L957

And here's the critical part: https://github.com/VictoriaMetrics/VictoriaMetrics/blob/5334f0c2ce91d975d22012546d882917c0ff5fcf/app/vmagent/remotewrite/pendingseries.go#L181 The goroutine (without deduplicator) refer timeseries data with index tsSrc := &src[i], where the timeseries data might be cleared.

While the goroutine(with deduplicator) refer timeseries data with a copy:

for _, ts := range tss {

It could be reproduced whenever you have:

  1. some remotewrites go with the deduplicator path. (dedupInterval != 0s)
  2. some remotewrites go with the normal path. (dedupInterval = 0s)

Hope this could help

jiekun avatar Apr 29 '24 12:04 jiekun

@alexintech just curious if you change the order - 120s,0s will it also cause an error?

AndrewChubatiuk avatar Apr 29 '24 12:04 AndrewChubatiuk

It'd be great to build vmagent with race detector: make vmagent-race and test it for possible data races.

Note, it significantly reduces performance of application and must be used only for testing.

f41gh7 avatar Apr 29 '24 13:04 f41gh7

the most obvious reason is this as mentioned by @jiekun, I've reproduces an issue as well and I've tested these changes @alexintech you can try this if you want

AndrewChubatiuk avatar Apr 29 '24 14:04 AndrewChubatiuk

@alexintech just curious if you change the order - 120s,0s will it also cause an error?

The same error, but it crashes quicker, just after the start.

@alexintech you can try this if you want

I'll check

alexintech avatar Apr 29 '24 14:04 alexintech

@alexintech you can try this if you want

seems that it's working!

alexintech avatar Apr 29 '24 14:04 alexintech

Re-opening issue since https://github.com/VictoriaMetrics/VictoriaMetrics/pull/6206 isn't released yet. It will be included into the next release.

hagen1778 avatar May 06 '24 20:05 hagen1778

This issue should be fixed in v1.102.0-rc1 release.

valyala avatar Jun 07 '24 21:06 valyala

FYI, see the follow-up commit 4f99799db706790af7fd79a47d0c00ae720af006

valyala avatar Jul 03 '24 12:07 valyala