node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

ZFS collector uses different names for the same metric on FreeBSD vs Linux

Open paxswill opened this issue 4 years ago • 6 comments

Host operating system: output of uname -a

  • FreeBSD shoebox 12.2-RELEASE-p2 FreeBSD 12.2-RELEASE-p2 663e6b09467(HEAD) TRUENAS amd64
  • Linux ishtar 5.8.0-40-generic #45-Ubuntu SMP Fri Jan 15 11:05:36 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

I'm using the packaged versions for FreeBSD 12 (latest) and Ubuntu 20.10

node_exporter, version 1.0.1 (branch: release-1.0, revision: 0)
  build user:       root
  build date:
  go version:       go1.15.6
node_exporter, version 1.0.1+ds (branch: debian/sid, revision: 1.0.1+ds-1)
  build user:       [email protected]
  build date:       20200618-22:00:01
  go version:       go1.14.4

node_exporter command line flags

FreeBSD: --web.listen-address=172.17.3.5:9100 --collector.textfile.directory=/var/tmp/node_exporter --collector.zfs Linux: (none, using defaults)

Are you running node_exporter in Docker?

No

What did you do that produced an error?

Ran node_exporter, collecting ZFS metrics on both Linux and FreeBSD

What did you expect to see?

Metrics would have the same name (I think that's the right term?) from both nodes.

What did you see instead?

The names of metrics exported from FreeBSD are different from the ones from Linux.

Extra Comments

The FreeBSD system is more precisely running TrueNAS, but it's close enough to FreeBSD that it doesn't make a big difference. The FreeBSD system is using OpenZFS 2.0, while the Linux system is using ZFSonLinux 0.8.4 (same project, they're skipping 1.0 while unifying the different OSs to a single codebase), so they're close enough that it shouldn't make a difference.

The FreeBSD ZFS collector is modifying the names that it's reading from sysctl, while the Linux ZFS collector seems to be basically dumping the contents of /proc/spl/kstat/zfs/*, munging the names so that they fit the proper naming scheme.

paxswill avatar Jan 26 '21 22:01 paxswill

I agree, that's quite unfortunate. Initially we always just dumped pretty much raw names from proc but changed that eventually (https://github.com/prometheus/node_exporter/issues/150). The ZFS collector wasn't change. Unfortunately changing it now will be a breaking change. Still, something we should probably do. @SuperQ wdyt?

discordianfish avatar Jan 29 '21 09:01 discordianfish

Yes, I never liked the dumping of random proc contents with no filtering. This leads to random breakage.

SuperQ avatar Feb 03 '21 07:02 SuperQ

To minimise this breaking change, how about cleaning up the references, but also providing a flag to control whether the old names should be published or not?

--publish-lagecy-names can default to true, allowing the cleanup to be a no-op immediately. In a future release, the flag can be disabled and eventually retired.

conallob avatar Jul 07 '23 11:07 conallob

Yeah I was thinking about some sort of 'translation' layer to support that for all collectors. So if you come up with a clean pattern for doing this, that would make sense. Beside that, we provided recording rules for breaking changes in the past.

discordianfish avatar Jul 10 '23 10:07 discordianfish

How about:

  1. Flag guard the logic that enables the new, cleaned up named. Default it to false
  2. Provide rewrite recording rules and declare future date, to give users enough notice
  3. Once future date arrives, flip the flag to true.
  4. Future Date + X months, remove the feature flag and legacy naming

Although it takes a while to complete all the states, I've used this pattern quite a lot professionally.

conallob avatar Jul 10 '23 12:07 conallob

@conallob I don't like maintaining those recording rules. IMO it's better for handing the changes in PromQL downstream with queries like sum(new_metric_name) or sum(old_metric_name). This way the user doesn't have to store two copies of the data.

Simplified timeline:

  1. Flag guard the logic that enables the new, cleaned up named. Default it to false
  2. Next major release (currently 2.0.0) we flip the flag to true and drop the old metris.

SuperQ avatar Jul 10 '23 13:07 SuperQ