loki icon indicating copy to clipboard operation
loki copied to clipboard

Loki Query Frontend fails with SIGSEGV in 3.0.0

Open chewrocca opened this issue 1 year ago • 2 comments

Describe the bug When starting Loki 3.0.0, a runtime panic occurs due to an invalid memory address or nil pointer dereference. This issue does not occur when Loki is pinned to version 2.9.8, but other components are upgraded.

level=info ts=2024-06-24T13:38:35.20013383Z caller=loki.go:503 msg="Loki started" startup_time=50.090374ms
level=info ts=2024-06-24T13:38:35.206338161Z caller=memberlist_client.go:580 phase=startup msg="joining memberlist cluster succeeded" reached_nodes=1 elapsed_time=7.68984ms
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x22c8d5f]

goroutine 2039 [running]:
github.com/grafana/loki/v3/pkg/lokifrontend/frontend.downstreamRoundTripper.Do({0xc000a242d0, {0x32314e0, 0x48f6180}, {0x0, 0x0}}, {0x3254d88, 0xc00cb95410}, {0x3270350, 0xc00bfad960})
	/src/loki/pkg/lokifrontend/frontend/downstream_roundtripper.go:37 +0x9f
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.retry.Do({{0x3231a60, 0xc0006f2280}, {0x3233d20, 0xc0021cae40}, 0x5, 0xc002152800}, {0x3254d88?, 0xc00cb95410}, {0x3270350, 0xc00bfad960})
	/src/loki/pkg/querier/queryrange/queryrangebase/retry.go:86 +0x2c3
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x3254d88?, 0xc00cb95410?})
	/src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x42
github.com/grafana/dskit/instrument.CollectedRequest({0x3254d88, 0xc00cb953e0}, {0x2a6b8f9, 0x5}, {0x3249af0, 0xc003c0f018}, 0xede0b6e12?, 0xc00cbaa328)
	/src/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x262
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x3254d88?, 0xc00cb953e0?}, {0x3270350?, 0xc00bfad960?})
	/src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0xa8
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x0?, {0x3254d88?, 0xc00cb953e0?}, {0x3270350?, 0xc00bfad960?})
	/src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.NewIndexStatsCacheMiddleware.NewResultsCacheMiddleware.func2.1({0x3254d88, 0xc00cb953e0}, {0x7f4768becaa0?, 0xc00bfad960})
	/src/loki/pkg/querier/queryrange/queryrangebase/results_cache.go:147 +0x6c
github.com/grafana/loki/v3/pkg/storage/chunk/cache/resultscache.HandlerFunc.Do(0x3254d88?, {0x3254d88?, 0xc00cb953e0?}, {0x7f4768becaa0?, 0xc00bfad960?})
	/src/loki/pkg/storage/chunk/cache/resultscache/util.go:11 +0x37
github.com/grafana/loki/v3/pkg/storage/chunk/cache/resultscache.ResultsCache.Do({{0x3231a60, 0xc0006f2280}, {0x3233020, 0xc00cbb6000}, {0x3255178, 0xc000798000}, {0x7f4768bec950, 0xc003ec2600}, {0x32336e0, 0xc003ec2630}, ...}, ...)
	/src/loki/pkg/storage/chunk/cache/resultscache/cache.go:112 +0xb45
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.resultsCache.Do({0xc00be6f130, {0x3231a60, 0xc0006f2280}, {0x323fd60, 0xc002143240}, 0xc002151c50}, {0x3254d88, 0xc00cb953b0}, {0x3270350, 0xc00bfad960})
	/src/loki/pkg/querier/queryrange/queryrangebase/results_cache.go:186 +0xf3
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x3254d88?, 0xc00cb953b0?})
	/src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x42
github.com/grafana/dskit/instrument.CollectedRequest({0x3254d88, 0xc00cb95350}, {0x2a8cea5, 0x11}, {0x3249af0, 0xc003c0f010}, 0x1?, 0xc00cbaaa48)
	/src/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x262
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x3254d88?, 0xc00cb95350?}, {0x3270350?, 0xc00bfad960?})
	/src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0xa8
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0xc00cb9eb40?, {0x3254d88?, 0xc00cb95350?}, {0x3270350?, 0xc00bfad960?})
	/src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.(*splitByInterval).Do(0xc00c483740, {0x3254d88?, 0xc00cb95350}, {0x3270350, 0xc00bfad8a0})
	/src/loki/pkg/querier/queryrange/split_by_interval.go:214 +0x476
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1.1({0x3254d88?, 0xc00cb95350?})
	/src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:28 +0x42
github.com/grafana/dskit/instrument.CollectedRequest({0x3254d88, 0xc00cb95320}, {0x2a8ce94, 0x11}, {0x3249af0, 0xc003c0f008}, 0x21a0055?, 0xc004104e60)
	/src/loki/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x262
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.InstrumentMiddleware.func1.1({0x3254d88?, 0xc00cb95320?}, {0x3270350?, 0xc00bfad8a0?})
	/src/loki/pkg/querier/queryrange/queryrangebase/instrumentation.go:26 +0xa8
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0x0?, {0x3254d88?, 0xc00cb95320?}, {0x3270350?, 0xc00bfad8a0?})
	/src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.limitsMiddleware.Do({{0x3281480?, 0xc003ec2600?}, {0x3231760?, 0xc00cbb4140?}}, {0x3254d88?, 0xc00cb952f0?}, {0x3270350, 0xc00bfad8a0})
	/src/loki/pkg/querier/queryrange/limits.go:199 +0xaf5
github.com/grafana/loki/v3/pkg/querier/queryrange.StatsCollectorMiddleware.func1.1({0x3254dc0, 0xc00cbb2cd0}, {0x3270350?, 0xc00bfad8a0?})
	/src/loki/pkg/querier/queryrange/stats.go:132 +0x122
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0xc00bfad8c0?, {0x3254dc0?, 0xc00cbb2cd0?}, {0x3270350?, 0xc00bfad8a0?})
	/src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.NewIndexStatsTripperware.statsTripperware.func4.1({0x3254dc0, 0xc00cbb2cd0}, {0x3270350, 0xc00bfad8a0})
	/src/loki/pkg/querier/queryrange/roundtrip.go:970 +0xfd
github.com/grafana/loki/v3/pkg/querier/queryrange/queryrangebase.HandlerFunc.Do(0xc00b2c4ee0?, {0x3254dc0?, 0xc00cbb2cd0?}, {0x3270350?, 0xc00bfad8a0?})
	/src/loki/pkg/querier/queryrange/queryrangebase/roundtrip.go:80 +0x37
github.com/grafana/loki/v3/pkg/querier/queryrange.getStatsForMatchers.func1({0x3254dc0, 0xc00cbb2cd0}, 0x0)
	/src/loki/pkg/querier/queryrange/shard_resolver.go:106 +0x282
github.com/grafana/dskit/concurrency.ForEachJob.func1()
	/src/loki/vendor/github.com/grafana/dskit/concurrency/runner.go:105 +0x83
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 2037
	/src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x96

To Reproduce Steps to reproduce the behavior:

  1. Made minor configuration changes according to upgrade notes
  2. Started Promtail and Loki 3.0.0
  3. Only the Loki Query Frontend component failed with this SIGSEGV after initially starting.

Expected behavior This does not fail in 2.9.8; if all components except for Loki Query Frontend are running 3.0.0, it does not fail.

Environment:

  • Infrastructure: Nomad
  • Deployment tool: terraform

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.

chewrocca avatar Jun 25 '24 10:06 chewrocca

https://github.com/grafana/loki/issues/13208 This seems related. However, we're using a query scheduler.

chewrocca avatar Jun 25 '24 13:06 chewrocca

It seems like this is working in "main."

chewrocca avatar Jun 27 '24 01:06 chewrocca