promql-engine icon indicating copy to clipboard operation
promql-engine copied to clipboard

Runtime Panic (index out of range) when running query

Open vanugrah opened this issue 2 years ago • 1 comments
trafficstars

Hello!

I enabled the thanos promql engine in our staging environment and I see the following runtime panic occurring:

level=error ts=2023-04-07T04:05:54.210299148Z caller=engine.go:361 msg="runtime panic in engine" expr="group by (container, dc, namespace, node, owner_kind, owner_name, pod) ((group without (host_ip, pod_ip) (kube_pod_info{job=\"kube-state-metrics\"} > 0)) * on (dc, namespace, pod) group_left (owner_kind, owner_name) (kube_pod_owner{job=\"kube-state-metrics\",pod!=\"\"} > 0) * on (dc, namespace, pod) group_right (node, owner_kind, owner_name) kube_pod_container_info{container!=\"\",job=\"kube-state-metrics\"})" err="runtime error: index out of range [4359] with length 4359" stacktrace="goroutine 9818884 [running]:\ngithub.com/thanos-community/promql-engine/engine.recoverEngine({0x2bd95c0, 0xc00052cdc0}, {0x2bf7820, 0xc0626c8060}, 0xc0626c4d00)\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/engine/engine.go:359 +0xc6\npanic({0x2507ae0, 0xc04c77d818})\n\t/usr/local/go/src/runtime/panic.go:884 +0x212\ngithub.com/thanos-community/promql-engine/execution/binary.(*vectorOperator).join(0xc0626de3c0?, 0xc011730000?, 0x1108?, 0x0?, 0xc042dc86c0, {0xc0626c45c0, 0x3, 0x4})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/binary/vector.go:242 +0x71a\ngithub.com/thanos-community/promql-engine/execution/binary.(*vectorOperator).initOutputs(0xc0626de3c0, {0x2bf4140, 0xc0626c4d40})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/binary/vector.go:109 +0x2a5\ngithub.com/thanos-community/promql-engine/execution/binary.(*vectorOperator).Series.func1()\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/binary/vector.go:78 +0x2e\nsync.(*Once).doSlow(0x0?, 0x0?)\n\t/usr/local/go/src/sync/once.go:74 +0xc2\nsync.(*Once).Do(...)\n\t/usr/local/go/src/sync/once.go:65\ngithub.com/thanos-community/promql-engine/execution/binary.(*vectorOperator).Series(0xc0626de3c0?, {0x2bf4140?, 0xc0626c4d40?})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/binary/vector.go:78 +0x7e\ngithub.com/thanos-community/promql-engine/execution/aggregate.(*aggregate).initializeScalarTables(0xc00002a8f0, {0x2bf4140?, 0xc0626c4d40?})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/aggregate/hashaggregate.go:200 +0x51\ngithub.com/thanos-community/promql-engine/execution/aggregate.(*aggregate).initializeTables(0xc00002a8f0, {0x2bf4140, 0xc0626c4d40})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/aggregate/hashaggregate.go:168 +0x45\ngithub.com/thanos-community/promql-engine/execution/aggregate.(*aggregate).Series.func1()\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/aggregate/hashaggregate.go:91 +0x2e\nsync.(*Once).doSlow(0x0?, 0x7f38f2ecd0c8?)\n\t/usr/local/go/src/sync/once.go:74 +0xc2\nsync.(*Once).Do(...)\n\t/usr/local/go/src/sync/once.go:65\ngithub.com/thanos-community/promql-engine/execution/aggregate.(*aggregate).Series(0xc00002a8f0?, {0x2bf4140?, 0xc0626c4d40?})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/aggregate/hashaggregate.go:91 +0x7e\ngithub.com/thanos-community/promql-engine/execution/exchange.(*concurrencyOperator).Series(0x2bf41e8?, {0x2bf4140?, 0xc0626c4d40?})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/execution/exchange/concurrent.go:41 +0x2c\ngithub.com/thanos-community/promql-engine/engine.(*compatibilityQuery).Exec(0xc0626d3d10, {0x2bf41e8, 0xc0626c6390})\n\t/go/pkg/mod/github.com/thanos-community/[email protected]/engine/engine.go:204 +0x1c2\ngithub.com/thanos-io/thanos/pkg/api/query.(*QueryAPI).query(0xc00022a870, 0xc061a05e00)\n\t/app/pkg/api/query/v1.go:441 +0x92b\ngithub.com/thanos-io/thanos/pkg/api.GetInstr.func1.1({0x2be9960, 0xc036e4aa10}, 0x4?)\n\t/app/pkg/api/api.go:211 +0x50\nnet/http.HandlerFunc.ServeHTTP(0xc0626a82f0?, {0x2be9960?, 0xc036e4aa10?}, 0x2bce3bc?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/thanos-io/thanos/pkg/logging.(*HTTPServerMiddleware).HTTPMiddleware.func1({0x2be9960?, 0xc036e4aa10}, 0xc061a05e00)\n\t/app/pkg/logging/http.go:69 +0x3b8\nnet/http.HandlerFunc.ServeHTTP(0x2bf41e8?, {0x2be9960?, 0xc036e4aa10?}, 0x2bcec38?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/thanos-io/thanos/pkg/server/http/middleware.RequestID.func1({0x2be9960, 0xc036e4aa10}, 0xc061a05d00)\n\t/app/pkg/server/http/middleware/request_id.go:40 +0x542\nnet/http.HandlerFunc.ServeHTTP(0x2184f60?, {0x2be9960?, 0xc036e4aa10?}, 0x4?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1({0x2bedd40, 0xc0608ddf00}, 0x490001?)\n\t/go/pkg/mod/github.com/!n!y!times/[email protected]/gzip.go:338 +0x26f\nnet/http.HandlerFunc.ServeHTTP(0x1?, {0x2bedd40?, 0xc0608ddf00?}, 0x0?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/thanos-io/thanos/pkg/extprom/http.httpInstrumentationHandler.func1({0x7f38f374df70?, 0xc04f6f2e10}, 0xc061a05d00)\n\t/app/pkg/extprom/http/instrument_server.go:75 +0x10b\nnet/http.HandlerFunc.ServeHTTP(0x7f38f374df70?, {0x7f38f374df70?, 0xc04f6f2e10?}, 0xc0626c6060?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerResponseSize.func1({0x7f38f374df70?, 0xc04f6f2dc0?}, 0xc061a05d00)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:288 +0xc5\nnet/http.HandlerFunc.ServeHTTP(0x7f38f374df70?, {0x7f38f374df70?, 0xc04f6f2dc0?}, 0x0?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x7f38f374df70?, 0xc04f6f2d70?}, 0xc061a05d00)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:146 +0xb8\nnet/http.HandlerFunc.ServeHTTP(0x22c9b80?, {0x7f38f374df70?, 0xc04f6f2d70?}, 0x6?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/thanos-io/thanos/pkg/extprom/http.instrumentHandlerInFlight.func1({0x7f38f374df70, 0xc04f6f2d70}, 0xc061a05d00)\n\t/app/pkg/extprom/http/instrument_server.go:162 +0x169\nnet/http.HandlerFunc.ServeHTTP(0x2bf1310?, {0x7f38f374df70?, 0xc04f6f2d70?}, 0xc02cbe1698?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerRequestSize.func1({0x2bf1310?, 0xc0015321c0?}, 0xc061a05d00)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:238 +0xc5\nnet/http.HandlerFunc.ServeHTTP(0x2bf41e8?, {0x2bf1310?, 0xc0015321c0?}, 0xc06269de90?)\n\t/usr/local/go/src/net/http/server.go:2109 +0x2f\ngithub.com/thanos-io/thanos/pkg/tracing.HTTPMiddleware.func1({0x2bf1310, 0xc0015321c0}, 0xc061a05c00)\n\t/app/pkg/tracing/http.go:62 +0x9a2\ngithub.com/prometheus/common/route.(*Router).handle.func1({0x2bf1310, 0xc0015321c0}, 0xc061a05b00, {0x0, 0x0, 0x478d4e?})\n\t/go/pkg/mod/github.com/prometheus/[email protected]/route/route.go:83 +0x18d\ngithub.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc001118c00, {0x2bf1310, 0xc0015321c0}, 0xc061a05b00)\n\t/go/pkg/mod/github.com/julienschmidt/[email protected]/router.go:387 +0x81c\ngithub.com/prometheus/common/route.(*Router).ServeHTTP(0xc0015321c0?, {0x2bf1310?, 0xc0015321c0?}, 0x268c113?)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/route/route.go:126 +0x26\nnet/http.(*ServeMux).ServeHTTP(0x0?, {0x2bf1310, 0xc0015321c0}, 0xc061a05b00)\n\t/usr/local/go/src/net/http/server.go:2487 +0x149\nnet/http.serverHandler.ServeHTTP({0x2be7b80?}, {0x2bf1310, 0xc0015321c0}, 0xc061a05b00)\n\t/usr/local/go/src/net/http/server.go:2947 +0x30c\nnet/http.(*conn).serve(0xc00296ab40, {0x2bf41e8, 0xc000948300})\n\t/usr/local/go/src/net/http/server.go:1991 +0x607\ncreated by net/http.(*Server).Serve\n\t/usr/local/go/src/net/http/server.go:3102 +0x4db\n"

The query that caused this panic is as follows:

group by (container, dc, namespace, node, owner_kind, owner_name, pod) (
    (group without (host_ip, pod_ip) (kube_pod_info{job=\"kube-state-metrics\"} > 0)) 
    * on (dc, namespace, pod) group_left (owner_kind, owner_name) 
    (kube_pod_owner{job=\"kube-state-metrics\",pod!=\"\"} > 0) 
    * on (dc, namespace, pod) group_right (node, owner_kind, owner_name) 
    kube_pod_container_info{container!=\"\",job=\"kube-state-metrics\"}
)

vanugrah avatar Apr 07 '23 04:04 vanugrah

The promql-engine version from the panic seems to be an older Thanos version v0.30.2. Can you try upgrading queriers to v0.31.0 and see if the panic still occurs? There have been quite a few fixes between versions.

saswatamcode avatar Apr 07 '23 04:04 saswatamcode

I will resolve this issue as it was very old. Please try latest version of the engine!

yeya24 avatar Apr 16 '25 06:04 yeya24