promql-engine
promql-engine copied to clipboard
Runtime Panic (index out of range) when running query
trafficstars
Hello!
I enabled the thanos promql engine in our staging environment and I see the following runtime panic occurring:
level=error ts=2023-04-07T04:05:54.210299148Z caller=engine.go:361 msg="runtime panic in engine" expr="group by (container, dc, namespace, node, owner_kind, owner_name, pod) ((group without (host_ip, pod_ip) (kube_pod_info{job=\"kube-state-metrics\"} > 0)) * on (dc, namespace, pod) group_left (owner_kind, owner_name) (kube_pod_owner{job=\"kube-state-metrics\",pod!=\"\"} > 0) * on (dc, namespace, pod) group_right (node, owner_kind, owner_name) kube_pod_container_info{container!=\"\",job=\"kube-state-metrics\"})" err="runtime error: index out of range [4359] with length 4359" stacktrace="goroutine 9818884 [running]:\ngithub.com/thanos-community/promql-engine/engine.recoverEngine({0x2bd95c0, 0xc00052cdc0}, {0x2bf7820, 0xc0626c8060}, 0xc0626c4d00)\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/engine/engine.go:359 +0xc6\npanic({0x2507ae0, 0xc04c77d818})\n\t/usr/local/go/src/runtime/panic.go:884 +0x212\ngithub.com/thanos-community/promql-engine/execution/binary.(*vectorOperator).join(0xc0626de3c0?, 0xc011730000?, 0x1108?, 0x0?, 0xc042dc86c0, {0xc0626c45c0, 0x3, 0x4})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/binary/vector.go:242 +0x71a\ngithub.com/thanos-community/promql-engine/execution/binary.(*vectorOperator).initOutputs(0xc0626de3c0, {0x2bf4140, 0xc0626c4d40})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/binary/vector.go:109 +0x2a5\ngithub.com/thanos-community/promql-engine/execution/binary.(*vectorOperator).Series.func1()\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/binary/vector.go:78 +0x2e\nsync.(*Once).doSlow(0x0?, 0x0?)\n\t/usr/local/go/src/sync/once.go:74 +0xc2\nsync.(*Once).Do(...)\n\t/usr/local/go/src/sync/once.go:65\ngithub.com/thanos-community/promql-engine/execution/binary.(*vectorOperator).Series(0xc0626de3c0?, {0x2bf4140?, 0xc0626c4d40?})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/binary/vector.go:78 +0x7e\ngithub.com/thanos-community/promql-engine/execution/aggregate.(*aggregate).initializeScalarTables(0xc00002a8f0, {0x2bf4140?, 0xc0626c4d40?})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/aggregate/hashaggregate.go:200 +0x51\ngithub.com/thanos-community/promql-engine/execution/aggregate.(*aggregate).initializeTables(0xc00002a8f0, {0x2bf4140, 0xc0626c4d40})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/aggregate/hashaggregate.go:168 +0x45\ngithub.com/thanos-community/promql-engine/execution/aggregate.(*aggregate).Series.func1()\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/aggregate/hashaggregate.go:91 +0x2e\nsync.(*Once).doSlow(0x0?, 0x7f38f2ecd0c8?)\n\t/usr/local/go/src/sync/once.go:74 +0xc2\nsync.(*Once).Do(...)\n\t/usr/local/go/src/sync/once.go:65\ngithub.com/thanos-community/promql-engine/execution/aggregate.(*aggregate).Series(0xc00002a8f0?, {0x2bf4140?, 0xc0626c4d40?})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/aggregate/hashaggregate.go:91 +0x7e\ngithub.com/thanos-community/promql-engine/execution/exchange.(*concurrencyOperator).Series(0x2bf41e8?, {0x2bf4140?, 0xc0626c4d40?})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/exchange/concurrent.go:41 +0x2c\ngithub.com/thanos-community/promql-engine/engine.(*compatibilityQuery).Exec(0xc0626d3d10, {0x2bf41e8, 0xc0626c6390})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/engine/engine.go:204 +0x1c2\ngithub.com/thanos-io/thanos/pkg/api/query.(*QueryAPI).query(0xc00022a870, 0xc061a05e00)\n\t/app/pkg/api/query/v1.go:441 +0x92b\ngithub.com/thanos-io/thanos/pkg/api.GetInstr.func1.1({0x2be9960, 0xc036e4aa10}, 0x4?)\n\t/app/pkg/api/api.go:211 +0x50\nnet/http.HandlerFunc.ServeHTTP(0xc0626a82f0?, {0x2be9960?, 0xc036e4aa10?}, 0x2bce3bc?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/thanos-io/thanos/pkg/logging.(*HTTPServerMiddleware).HTTPMiddleware.func1({0x2be9960?, 0xc036e4aa10}, 0xc061a05e00)\n\t/app/pkg/logging/http.go:69 +0x3b8\nnet/http.HandlerFunc.ServeHTTP(0x2bf41e8?, {0x2be9960?, 0xc036e4aa10?}, 0x2bcec38?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/thanos-io/thanos/pkg/server/http/middleware.RequestID.func1({0x2be9960, 0xc036e4aa10}, 0xc061a05d00)\n\t/app/pkg/server/http/middleware/request_id.go:40 +0x542\nnet/http.HandlerFunc.ServeHTTP(0x2184f60?, {0x2be9960?, 0xc036e4aa10?}, 0x4?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1({0x2bedd40, 0xc0608ddf00}, 0x490001?)\n\t/go/pkg/mod/github.com/!n!y!times/[email protected]/gzip.go:338 +0x26f\nnet/http.HandlerFunc.ServeHTTP(0x1?, {0x2bedd40?, 0xc0608ddf00?}, 0x0?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/thanos-io/thanos/pkg/extprom/http.httpInstrumentationHandler.func1({0x7f38f374df70?, 0xc04f6f2e10}, 0xc061a05d00)\n\t/app/pkg/extprom/http/instrument_server.go:75 +0x10b\nnet/http.HandlerFunc.ServeHTTP(0x7f38f374df70?, {0x7f38f374df70?, 0xc04f6f2e10?}, 0xc0626c6060?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerResponseSize.func1({0x7f38f374df70?, 0xc04f6f2dc0?}, 0xc061a05d00)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:288 +0xc5\nnet/http.HandlerFunc.ServeHTTP(0x7f38f374df70?, {0x7f38f374df70?, 0xc04f6f2dc0?}, 0x0?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x7f38f374df70?, 0xc04f6f2d70?}, 0xc061a05d00)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:146 +0xb8\nnet/http.HandlerFunc.ServeHTTP(0x22c9b80?, {0x7f38f374df70?, 0xc04f6f2d70?}, 0x6?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/thanos-io/thanos/pkg/extprom/http.instrumentHandlerInFlight.func1({0x7f38f374df70, 0xc04f6f2d70}, 0xc061a05d00)\n\t/app/pkg/extprom/http/instrument_server.go:162 +0x169\nnet/http.HandlerFunc.ServeHTTP(0x2bf1310?, {0x7f38f374df70?, 0xc04f6f2d70?}, 0xc02cbe1698?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerRequestSize.func1({0x2bf1310?, 0xc0015321c0?}, 0xc061a05d00)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:238 +0xc5\nnet/http.HandlerFunc.ServeHTTP(0x2bf41e8?, {0x2bf1310?, 0xc0015321c0?}, 0xc06269de90?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/thanos-io/thanos/pkg/tracing.HTTPMiddleware.func1({0x2bf1310, 0xc0015321c0}, 0xc061a05c00)\n\t/app/pkg/tracing/http.go:62 +0x9a2\ngithub.com/prometheus/common/route.(*Router).handle.func1({0x2bf1310, 0xc0015321c0}, 0xc061a05b00, {0x0, 0x0, 0x478d4e?})\n\t/go/pkg/mod/github.com/prometheus/[email protected]/route/route.go:83 +0x18d\ngithub.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc001118c00, {0x2bf1310, 0xc0015321c0}, 0xc061a05b00)\n\t/go/pkg/mod/github.com/julienschmidt/[email protected]/router.go:387 +0x81c\ngithub.com/prometheus/common/route.(*Router).ServeHTTP(0xc0015321c0?, {0x2bf1310?, 0xc0015321c0?}, 0x268c113?)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/route/route.go:126 +0x26\nnet/http.(*ServeMux).ServeHTTP(0x0?, {0x2bf1310, 0xc0015321c0}, 0xc061a05b00)\n\t/usr/local/go/src/net/http/server.go:2487 +0x149\nnet/http.serverHandler.ServeHTTP({0x2be7b80?}, {0x2bf1310, 0xc0015321c0}, 0xc061a05b00)\n\t/usr/local/go/src/net/http/server.go:2947 +0x30c\nnet/http.(*conn).serve(0xc00296ab40, {0x2bf41e8, 0xc000948300})\n\t/usr/local/go/src/net/http/server.go:1991 +0x607\ncreated by net/http.(*Server).Serve\n\t/usr/local/go/src/net/http/server.go:3102 +0x4db\n"
The query that caused this panic is as follows:
group by (container, dc, namespace, node, owner_kind, owner_name, pod) (
(group without (host_ip, pod_ip) (kube_pod_info{job=\"kube-state-metrics\"} > 0))
* on (dc, namespace, pod) group_left (owner_kind, owner_name)
(kube_pod_owner{job=\"kube-state-metrics\",pod!=\"\"} > 0)
* on (dc, namespace, pod) group_right (node, owner_kind, owner_name)
kube_pod_container_info{container!=\"\",job=\"kube-state-metrics\"}
)
The promql-engine version from the panic seems to be an older Thanos version v0.30.2. Can you try upgrading queriers to v0.31.0 and see if the panic still occurs? There have been quite a few fixes between versions.
I will resolve this issue as it was very old. Please try latest version of the engine!