4.1.11 and above throws error `"tally.internal.counter_cardinality" is not a valid metric name`
error
I noticed we cannot upgrade past 4.1.10 because we get this error in our tests
panic: descriptor Desc{fqName: "tally.internal.counter_cardinality", help: "tally.internal.counter_cardinality gauge", constLabels: {}, variableLabels: [version host instance]} is invalid: "tally.internal.counter_cardinality" is not a valid metric name [recovered]
I noticed some changes in these tags that may be related
https://github.com/uber-go/tally/compare/v4.1.10...v4.1.11
Is this panic an expected error or a regression?
references
- https://github.com/runatlantis/atlantis/actions/runs/8496376185/job/23273689226?pr=4228
- https://github.com/runatlantis/atlantis/blob/main/server/metrics/scope.go
- https://github.com/runatlantis/atlantis/blob/84f0fff1696a1f9ef70ba9e5d3f167e8eb4e28e8/server/metrics/scope_test.go#L18
- https://github.com/runatlantis/atlantis/pull/4228#pullrequestreview-1912574445
Wondering about the same. Disabled cardinality metrics in NewRootScope() in the interim.
Setting the OmitCardinalityMetrics to true and false didn't remove this issue for my project, we have had to stay on v4.1.10
Seeing this too, It looks like there's not great test coverage of the prometheus exporter
There's a simple test that would have caught this issue: https://github.com/uber-go/tally/compare/master...ptxmac:tally:test-prom
I used this with git bisect:
2498a0dc9e7370c10e54d24ff5ee307845bcea73 is the first bad commit
Found a workaround / correct way to initialize the root scope:
tally.NewRootScope(tally.ScopeOptions{
Prefix: "prefix",
Tags: map[string]string{"env": "dev"},
CachedReporter: r,
Separator: prometheus.DefaultSeparator,
SanitizeOptions: &prometheus.DefaultSanitizerOpts,
}, 1*time.Second)
I guess that does technically make sense, but it's a shame that the default naming for cardinality metrics results in a panic if the correct options are not supplied.
If the default option results in a panic, then it's not really a good default option!
Is there any update on a fix being developed for this? It is stopping our projects from moving forward to the newer versions. As stated above a default that causes a panic isn't ideal