mongodb_exporter icon indicating copy to clipboard operation
mongodb_exporter copied to clipboard

topmetrics collector causing panic.

Open theodrim opened this issue 2 years ago • 2 comments

Describe the bug

Panic with topmetrics collector enabled when connected to cloud atlas mongodb instance.

To Reproduce Steps to reproduce the behavior:

  1. what parameters are being passed to mongodb_exporter: --no-mongodb.direct-connect --collect-all

  2. describe steps to reproduce the issue: start mongodb_exporter with parameters above and connect to cloud atlas mongodb instance, curl metrics endpoint

Expected behavior metrics being shown

Logs

2023/06/22 13:24:25 http: panic serving 10.4.166.252:48284: descriptor Desc{fqName: "", help: "", constLabels: {}, variableLabels: []} is invalid: label value "\xf8\xc2}\xbbz\x14" is not valid UTF-8
goroutine 4026201 [running]:
net/http.(*conn).serve.func1()
	/opt/hostedtoolcache/go/1.19.9/x64/src/net/http/server.go:1850 +0xb8
panic({0x6d3800, 0x400091cc90})
	/opt/hostedtoolcache/go/1.19.9/x64/src/runtime/panic.go:890 +0x260
github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(...)
	/home/runner/go/pkg/mod/github.com/!percona-!lab/[email protected]/prometheus/registry.go:403
github.com/percona/mongodb_exporter/exporter.(*Exporter).makeRegistry(0x40000a5e50, {0x974de0?, 0x40005d98c0}, 0x400028c8f0, {0x9725e8?, 0x40007a8ff0}, {{0x400028f8f0, 0x1, 0x1}, 0x0, ...})
	/home/runner/work/mongodb_exporter/mongodb_exporter/exporter/exporter.go:214 +0xbfc
github.com/percona/mongodb_exporter/exporter.(*Exporter).Handler.func1({0x9744d0, 0x400007e000}, 0x40001ec400)
	/home/runner/work/mongodb_exporter/mongodb_exporter/exporter/exporter.go:332 +0x418
net/http.HandlerFunc.ServeHTTP(0x40004c6ad8?, {0x9744d0?, 0x400007e000?}, 0x0?)
	/opt/hostedtoolcache/go/1.19.9/x64/src/net/http/server.go:2109 +0x38
net/http.(*ServeMux).ServeHTTP(0x0?, {0x9744d0, 0x400007e000}, 0x40001ec400)
	/opt/hostedtoolcache/go/1.19.9/x64/src/net/http/server.go:2487 +0x140
net/http.serverHandler.ServeHTTP({0x40006e5a40?}, {0x9744d0, 0x400007e000}, 0x40001ec400)
	/opt/hostedtoolcache/go/1.19.9/x64/src/net/http/server.go:2947 +0x2cc
net/http.(*conn).serve(0x40005308c0, {0x974e18, 0x40002ebbf0})
	/opt/hostedtoolcache/go/1.19.9/x64/src/net/http/server.go:1991 +0x544
created by net/http.(*Server).Serve
	/opt/hostedtoolcache/go/1.19.9/x64/src/net/http/server.go:3102 +0x43c

Environment

  • OS, Linux 5.15.108 #1 SMP Wed May 24 23:53:57 UTC 2023 aarch64 GNU/Linux
  • environment (docker, k8s, etc) - EKS v1.26.4-eks-0a21954
  • MongoDB version 4.4.22

Additional context mongodb_exporter version 0.39.0

Additionally this is only impacting one of our cluster (prod) and not others (dev), they both have same 4.4.22 mongodb version as well as same hardware configuration / replica count. As a workaround we are currently running mongodb_exporter with all expect topmetrics collectors enabled and so far no issue.

theodrim avatar Jun 23 '23 21:06 theodrim

Hello there,

I'm also facing some issues with topmetrics.

2023/10/08 16:42:35 http: panic serving <redacted>:54398: descriptor Desc{fqName: "", help: "", constLabels: {}, variableLabels: []} is invalid: (BSONObjectTooLarge) BSONObj size: 17453485 (0x10A51AD) is invalid. Size must be between 0 and 16793600(16MB) First element: note: "all times in microseconds"

Environment

  • OS: 5.14.0-284.25.1.0.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 4 09:00:16 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux
  • MongoDB 6.0.10-1

eduardolmedeiros avatar Oct 08 '23 16:10 eduardolmedeiros

I'm the furthest thing from a mongodb expert, but I have this same issue on one of my replica sets.

In rs1 with 3 nodes, topmetrics works perfectly on all servers. In rs2 with 15 nodes, topmetrics don't work at all from secondary servers. The same follows for running db.adminCommand({top:1}) from the mongo shell.

The point being that there's nothing the exporter can do to overcome this max size limitation in mongodb, but it could be able to detect if topmetrics are available on a node before attempting to scrape them. If it did that, it could get topmetrics from only the primary of really large replica sets (alternatively, I suppose it could just check for isMaster or isPrimary if that was updated in newer releases).

I have 0 golang experience outside of this attempt, but I tried to have AI update exporter.go to detect if topmetrics is available. It didn't work at all, and even if it did, it seems like an incomplete solution because it looks like the exporter initializes once at startup, so if/when the primary mongo rotates, we'd enter this error situation again.

I'd love to hear some suggestions on making this work, other than not collecting topmetrics.

wroblewj0 avatar Oct 12 '23 19:10 wroblewj0