node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

Bind-mounting a filesystem to its original mount path causes a duplicate metric error

Open jcaesar opened this issue 1 year ago • 0 comments

When a path that is already a mount point is bind-mounted to itself, e.g. with

mount -ttmpfs - /foo
mount -obind /foo /foo

a duplicate metric error occurs and there is currently no way to collect metrics for the original mount.

Background

This may seem like a weird issue at first, so let me explain why this situation occurs under real use:

  • Nix, the package manager, keeps its files at /nix/store. Since it wants to prevent accidental modification of these files (by the root user or errant software or whatnot), the nix daemon opens the folder for itself so it can write to it, and then bind-mounts /nix/store read-only, e.g. mount -obind,ro /nix/store /nix/store.

  • /nix/store tends to contain a lot of files and folders (100s of GBs in 100k's of files in one folder). I like to put that all onto a different filesystem or even disk.

  • I can't meaningfully ignore the bind mount with --collector.filesystem.fs-types-exclude, since the fstype of the bind mount is the same as that of the underlying fs.

  • I can ignore everything mounted at /nix/store with --collector.filesystem.mount-points-exclude, but then I'll won't know whether that is about to fill up.

My current ugly workaround is to --collector.filesystem.mount-points-exclude=/nix/store, and then to bind-mount /nix/store to yet another unrelated path, e.g. mount -obind,ro /nix/store /run/.export.nix.store.

Full error

Feb 27 19:42:16 spaniel node_exporter[26585]: time=2025-02-27T10:42:16.940Z level=ERROR source=http.go:169 msg="error gathering metrics: 8 error(s) occurred:\n* [from Gatherer #2] collected metric \"node_filesystem_device_error\" { label:{name:\"device\"  value:\"/dev/sda3\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"bcachefs\"}  label:{name:\"mountpoint\"  value:\"/nix/store\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_readonly\" { label:{name:\"device\"  value:\"/dev/sda3\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"bcachefs\"}  label:{name:\"mountpoint\"  value:\"/nix/store\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_size_bytes\" { label:{name:\"device\"  value:\"/dev/sda3\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"bcachefs\"}  label:{name:\"mountpoint\"  value:\"/nix/store\"}  gauge:{value:9.8579775488e+10}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_free_bytes\" { label:{name:\"device\"  value:\"/dev/sda3\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"bcachefs\"}  label:{name:\"mountpoint\"  value:\"/nix/store\"}  gauge:{value:7.520336384e+10}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_avail_bytes\" { label:{name:\"device\"  value:\"/dev/sda3\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"bcachefs\"}  label:{name:\"mountpoint\"  value:\"/nix/store\"}  gauge:{value:7.4046388736e+10}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files\" { label:{name:\"device\"  value:\"/dev/sda3\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"bcachefs\"}  label:{name:\"mountpoint\"  value:\"/nix/store\"}  gauge:{value:1.17505256e+09}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files_free\" { label:{name:\"device\"  value:\"/dev/sda3\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"bcachefs\"}  label:{name:\"mountpoint\"  value:\"/nix/store\"}  gauge:{value:1.17505256e+09}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_mount_info\" { label:{name:\"device\"  value:\"/dev/sda3\"}  label:{name:\"major\"  value:\"8\"}  label:{name:\"minor\"  value:\"3\"}  label:{name:\"mountpoint\"  value:\"/nix/store\"}  gauge:{value:1}} was collected before with the same name and label values"

The first line of that error formatted:

* [from Gatherer #2] collected metric "node_filesystem_device_error" {
    label:{name:"device"  value:"/dev/sda3"}
    label:{name:"device_error"  value:""}
    label:{name:"fstype"  value:"bcachefs"}
    label:{name:"mountpoint"  value:"/nix/store"}
    gauge:{value:0}
  }
  was collected before with the same name and label values

Possible approaches

Albeit Go isn't exactly my home ground, I'd like to try my hand at a fix. Do you have any recommendation on which way to go with that?

  • (Provide a way to or default-)ignore all bind mounts (not as easy as it sounds)
  • Deduplicate (also tricky, there may be situations with real duplicates I'm not aware of)
  • Provide a way to ignore filesystems based on mount options, to ignore e.g. read-only file systems
  • Change the reported fstype of bind mounts (wrong as far as linux is concerned and might cause weird breakage)
  • Find a way to disambiguate multiple mounts to the same folder (That would also solve situations such as for x in 1 2; do mount -ttmpfs - foo; done, though I don't think those are a relevant problem) (not much to go by for this except the mount ID, and that is a terrible label)
  • …?

jcaesar avatar Feb 28 '25 14:02 jcaesar