jtimon icon indicating copy to clipboard operation
jtimon copied to clipboard

Prometheus - metric has label dimensions inconsistent with previously collected metrics in the same metric family

Open fstolba opened this issue 4 years ago • 4 comments

Hi,

I'm using the latest version from git.

When trying to collect the endpoint /junos/services/label-switched-path/usage/ I'm presented with the following error messages when accessing the Prometheus metrics page:

An error has occurred during metrics gathering:

279 error(s) occurred:
* collected metric _mpls_lsps_constrained_path_tunnels_tunnel_state_counters_jnx_packet_rate label:<name:"_mpls_lsps_constrained_path_tunnels_tunnel__name" value:"lsp_name" > label:<name:"_mpls_lsps_constrained_path_tunnels_tunnel__source" value:"0.0.0.0" > label:<name:"_mpls_lsps_constrained_path_tunnels_tunnel_state_counters__name" value:"c-45772" > label:<name:"device" value:"lsp_device" > untyped:<value:87 >  has label dimensions inconsistent with previously collected metrics in the same metric family
[ above message repeatet 278 times with different labels and metrics all related to the endpoint mentioned above ]

There's various pieces of information here and here. The last link says that this limitation was removed in a later build of the Prometheus client library.

fstolba avatar Jul 05 '19 10:07 fstolba

The last link leads to this page which says that this limitation was removed in newer versions of prometheus/client_golang. I actually tried removing the vendored client_golang and build using the latest release which fixes the problem at hand but apparently causes another one. How should we proceed here?

fstolba avatar Jul 05 '19 10:07 fstolba

Shat is another issue you are runnign into after upgrading the prometheus package?

nileshsimaria avatar Jul 17 '19 02:07 nileshsimaria

Hi @nileshsimaria after upgrading the vendored code the metrics that are considered invalid now are being exposed, however there are other metrics missing from the output that work fine now (for example most gauge values from /components/component/properties/property).

Honestly I haven't looked into it any further to date but it seems to me that debugging the issue with the more recent dependency version is the way to go. Tracking this should propably happen in it's own PR.

fstolba avatar Jul 17 '19 05:07 fstolba

btw we ran into this with collecting data from devices running different JunOS versions. After all were at the same version again (and a restart of jtimon later), everything was working again (still, not so nice to loose telemetry data in such cases).

StephenKing avatar Sep 25 '20 21:09 StephenKing