tally icon indicating copy to clipboard operation
tally copied to clipboard

4.1.11 and above throws error `"tally.internal.counter_cardinality" is not a valid metric name`

Open nitrocode opened this issue 1 year ago • 6 comments

error

I noticed we cannot upgrade past 4.1.10 because we get this error in our tests

panic: descriptor Desc{fqName: "tally.internal.counter_cardinality", help: "tally.internal.counter_cardinality gauge", constLabels: {}, variableLabels: [version host instance]} is invalid: "tally.internal.counter_cardinality" is not a valid metric name [recovered]

I noticed some changes in these tags that may be related

https://github.com/uber-go/tally/compare/v4.1.10...v4.1.11

Is this panic an expected error or a regression?

references

  • https://github.com/runatlantis/atlantis/actions/runs/8496376185/job/23273689226?pr=4228
  • https://github.com/runatlantis/atlantis/blob/main/server/metrics/scope.go
  • https://github.com/runatlantis/atlantis/blob/84f0fff1696a1f9ef70ba9e5d3f167e8eb4e28e8/server/metrics/scope_test.go#L18
  • https://github.com/runatlantis/atlantis/pull/4228#pullrequestreview-1912574445

nitrocode avatar Mar 31 '24 23:03 nitrocode

Wondering about the same. Disabled cardinality metrics in NewRootScope() in the interim.

rbarabas avatar Apr 18 '24 19:04 rbarabas

Setting the OmitCardinalityMetrics to true and false didn't remove this issue for my project, we have had to stay on v4.1.10

TheCodingChameleon avatar May 01 '24 10:05 TheCodingChameleon

Seeing this too, It looks like there's not great test coverage of the prometheus exporter

ptxmac avatar May 07 '24 07:05 ptxmac

There's a simple test that would have caught this issue: https://github.com/uber-go/tally/compare/master...ptxmac:tally:test-prom

I used this with git bisect:

2498a0dc9e7370c10e54d24ff5ee307845bcea73 is the first bad commit

ptxmac avatar May 07 '24 09:05 ptxmac

Found a workaround / correct way to initialize the root scope:

	tally.NewRootScope(tally.ScopeOptions{
		Prefix:          "prefix",
		Tags:            map[string]string{"env": "dev"},
		CachedReporter:  r,
		Separator:       prometheus.DefaultSeparator,
		SanitizeOptions: &prometheus.DefaultSanitizerOpts,
	}, 1*time.Second)

I guess that does technically make sense, but it's a shame that the default naming for cardinality metrics results in a panic if the correct options are not supplied.

If the default option results in a panic, then it's not really a good default option!

ptxmac avatar May 07 '24 09:05 ptxmac

Is there any update on a fix being developed for this? It is stopping our projects from moving forward to the newer versions. As stated above a default that causes a panic isn't ideal

TheCodingChameleon avatar Aug 05 '24 15:08 TheCodingChameleon