node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

Way to distinguish bind mounted path ?

Open keyolk opened this issue 7 years ago • 16 comments

Host operating system:

Linux css 4.4.68-nx #122 SMP Mon May 15 09:46:11 KST 2017 x86_64 GNU/Linux

node_exporter version:

  build user:       root@bb6d0678e7f3
  build date:       20170321-12:12:54
  go version:       go1.7.5

Are you running node_exporter in Docker?

yes

What did you do that produced an error?

With given query below

node_filesystem_size{instance=~"(css).*",fstype=~"(ext4|xfs)",mountpoint!~".*mapper.*",device!~".*mapper.*"}

Result is

node_filesystem_size{device="/dev/sda1",fstype="ext4",instance="css:9100",job="node",mountpoint="/rootfs"}	21003628544
node_filesystem_size{device="/dev/sda3",fstype="ext4",instance="css:9100",job="node",mountpoint="/rootfs/home"}	857421250560
node_filesystem_size{device="/dev/sda3",fstype="ext4",instance="css:9100",job="node",mountpoint="/rootfs/home1"}	857421250560

Actually second record is bind mounted point. If I can get mount options it would be helpful, to exclude the record.

keyolk avatar Jun 12 '17 08:06 keyolk

Can you attach a copy of /proc/mounts? This is where the exporter gets the filesystem list.

SuperQ avatar Jun 12 '17 11:06 SuperQ

@SuperQ

/proc/mounts here

proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
devtmpfs /dev devtmpfs rw,relatime,size=24708776k,nr_inodes=6177194,mode=755 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/sda1 / ext4 rw,nodev,noatime,nobarrier,data=ordered 0 0
/dev/sda3 /home1 ext4 rw,nodev,noatime,nobarrier,data=ordered 0 0
/dev/sda3 /home ext4 rw,nodev,noatime,nobarrier,data=ordered 0 0
cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /cgroup/memory cgroup rw,relatime,memory 0 0
cgroup /cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0
cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0
cgroup /cgroup/pids cgroup rw,relatime,pids 0 0

keyolk avatar Jun 15 '17 08:06 keyolk

To me that looks like the mount options would be no help in this case. There is no way to tell the difference between {device="/dev/sda3",mountpoint="/home1"} and {device="/dev/sda3",mountpoint="/home"}

SuperQ avatar Jun 15 '17 08:06 SuperQ

@SuperQ Actually it is mounted like below

$ cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Tue Jun 21 16:50:34 2016
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=e4ebf103-b5b9-4620-a532-ccc7205f9eb2 /                       ext4    defaults,noatime,nodev,nobarrier        1 1
UUID=f860799f-1af0-4e16-ac4f-42a07cac8173 /home1                  ext4    defaults,noatime,nodev,nobarrier        1 2
UUID=b76e0523-3bae-412d-a06c-1ad53572aba4 swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/home1  /home   none    default,bind    0       0

in terms of monitoring storage, distinguising those two points are somewhat not good to me : (

keyolk avatar Jun 16 '17 02:06 keyolk

The node_exporter does not read from /etc/fstab as it is not the authoritative source of information about what is mounted. Many systems use automatic mount management, hence the only source of what is mounted comes from /proc/mounts generated by the kernel.

Duplicate bind mounts are indistinguishable from the kernel's perspective, similar to a hard link.

There are two options:

  • Create inventory metrics based on /etc/fstab and expose them with the textfile interface.
  • Use a symlink instead of a bind mount.

SuperQ avatar Jun 16 '17 05:06 SuperQ

There is a better source of information than /proc/mounts: /proc/self/mountinfo. That has added data as to what subdirectory from the device is mounted at the destination. For a bind mount of /data/shared/www into /var/www/shared, it looks like this:

34 21 253:4 / /data rw,noatime - xfs /dev/mapper/stor-data rw,attr2,inode64,logbufs=8,logbsize=64k,sunit=128,swidth=640,noquota
37 21 253:4 /shared/www /var/www/shared rw,noatime - xfs /dev/mapper/stor-data rw,attr2,inode64,logbufs=8,logbsize=64k,sunit=128,swidth=640,noquota

Perhaps the most prometheus-ish way to do this would be to just export this information (mountroot="/shared/www" for the second mount or similar). Then downstream rules can just choose to ignore any timeseries that don't have mountroot="/".

This won't help OP since they're bind-mounting the root of the filesystem (which truly is indistinguishable), but it will help those of us who bind-mount subtrees, which is very common (and having many random subtrees mounted is more common than having the root mounted many times).

Note that symlinks are usually an option for bind-mounting the root, but not for subtrees: one of the nice things about bind-mounting subtrees is that lets you bypass permissions checking for the parent directories at the source, which enables some interesting use cases that symlinks cannot provide.

marcan avatar Feb 02 '18 05:02 marcan

@marcan That's a good idea. I think it's something we can implement.

SuperQ avatar Feb 02 '18 08:02 SuperQ

Perhaps the most prometheus-ish way to do this would be to just export this information (mountroot="/shared/www" for the second mount or similar).

I think we should be dropping such filesystems, as we already have the usage information from the actual filesystem mount. I'm not sure it's a good idea to add another label onto a key metric which already has more labels than it technically needs.

brian-brazil avatar Feb 02 '18 08:02 brian-brazil

@brian-brazil I agree, we don't need them in the use metrics. We could include the bind mounting as a separate node_filesystem_mount_info mapping.

SuperQ avatar Feb 02 '18 08:02 SuperQ

The tricky bit is that it's possible to unmount the bare-root filesystem and leave the bind mount. At that point you'd have to implement deduplication in the mount list to make sure you don't drop any useful data. Perhaps this algorithm: for a given mounted device, prefer the mount with the least number of components in the mountroot, then among those prefer the oldest one (coming earlier in mountinfo). This approach would fix OP's problem.

marcan avatar Feb 02 '18 08:02 marcan

@marcan I was considering deduplication by "first listed" in the mountinfo. This means that it's possible for labeling to shift. But I'm guessing the kernel data structure that holds mountinfo is populated in order by time. So "first" is original.

SuperQ avatar Feb 02 '18 08:02 SuperQ

There's nothing saying you can't normally mount a filesystem twice, and I think in that case we'd want to expose both.

We could include the bind mounting as a separate node_filesystem_mount_info mapping.

I can imagine that getting high cardinality and high churn, and I'm not sure what it's gaining us.

brian-brazil avatar Feb 02 '18 08:02 brian-brazil

There's no way to distinguish a filesystem mounted twice from a filesystem mounted and then its root bindmounted elsewhere. As far as I know both of those result in identical kernel state.

Ultimately I think the options are: either show the first mount in mountinfo order, or show root mounts only (but what if a filesystem is only mounted from a subdirectory? then show that instead? what if it's mounted multiple times but never at the root?), or implement some kind of priority order and show the first mount only.

marcan avatar Feb 02 '18 08:02 marcan

No strong preference, but show the first mount in mountinfo order seems what you want in most cases. So let's go with this? Unless someone has objections.

discordianfish avatar Apr 05 '18 10:04 discordianfish

good

AndyFHAF avatar Mar 05 '21 13:03 AndyFHAF

i have the impression that just reading /proc/self/mountinfo is sufficient here, why didn't we take this approach here?

anarcat avatar Nov 20 '23 19:11 anarcat