netdata
netdata copied to clipboard
[Feat]: ZFS Pool Usage Data
Problem
It would be great to see the "used" data for a ZFS Pool.
Other Issues or feature requests touch on it but I dont think it actually got addressed (please correct me if I am wrong). I can see #5590 asked for a Feature Request to be opened but I could not see that it was actually done.
Description
On all my ZFS Pool's the "used" value is 0.00. I have attempted to alter the netdata.conf settings in the following with no success:
plugin:proc:/proc/diskstats plugin:proc:/proc/diskstats:naspool plugin:proc:diskspace:/mnt/naspool
I can see the correct data 'used' values when I run ZFS list at CLI and in my Proxmox dashboard but not in Netdata
Importance
really want
Value proposition
...
Proposed implementation
I can see that other ZFS related issues mention the use of ZFS list to implement this. Unfortunately i have limited Linux cli knowledge so dont know what else to suggest here.
@cdk222, hey.
On all my ZFS Pool's the "used" value is 0.00.
I believe if you click on "used" you will see that it is not zero, but some Kilobytes. It is shown as zero because of units autoscaling (TiB, and it is correct - used is 0.00 TiB).
Netdata reports whatever the system reports (statfs()) for mount points (/proc/self/mountinfo
), I think you will see the same values if you df -h
.
I can see the correct data 'used' values when I run ZFS list
Yes, I think that is the easiest way.
Thanks @ilyam8 - yeah its 128kb for all 3 of my ZFS Pools
I use Netdata to feed into a dashboard so was hoping that i could access the 'used' data without needing to continually run a linux command (df -h or ZFS list) in background.
If it is not looking likely that netdata will be able to get this info should i close this?
zfs list /mnt/naspool -H -o used
This seems to be the command that does the job
@cdk222 I think that is a completely valid feature request. We just need to figure out what we need to collect.
I am not a ZFS specialist. I have a server with ZFS, it has 1 pool. And I see that zfs list
returns a lot of entries.
zfs list
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 1.79T 1.72T 96K /rpool
rpool/ROOT 37.1G 1.72T 96K /rpool/ROOT
rpool/ROOT/pve-1 37.1G 1.72T 34.0G /
rpool/ROOT/pve-1/0f606363d425ffd0dde292a57c9210a6e037fe2967f7fe85467b682b9bd53de8 112K 1.72T 59.8M legacy
rpool/ROOT/pve-1/10afdae7759835321152ba518bb7b72ff491c24860bfe9cc3912fdb4a77c4799 124K 1.72T 92.0M legacy
rpool/ROOT/pve-1/206c431a22650158ed2dc92b4f47c774dca18881af976f5072f1e59199dbdeb6 29.5M 1.72T 166M legacy
rpool/ROOT/pve-1/2e3fb20aafce6e3e551491d4e8b87cdc9c1ee14b0314731162fac870b18b1433 50.5M 1.72T 50.5M legacy
rpool/ROOT/pve-1/34c128bbebec52ffb1cc6ddb3ec058f2388ac13b726e8f46b5c6e565efde32c3 148K 1.72T 92.0M legacy
rpool/ROOT/pve-1/34c128bbebec52ffb1cc6ddb3ec058f2388ac13b726e8f46b5c6e565efde32c3-init 200K 1.72T 92.0M legacy
rpool/ROOT/pve-1/39i18ssrnipcd3r361hf3ws6s 455M 1.72T 455M legacy
rpool/ROOT/pve-1/3ekqqdr2w2ibkrhek1hvcbma4 454M 1.72T 948M legacy
rpool/ROOT/pve-1/458fe0ee7e23c8387ffc052360e7b561e7a6254e8926f3015f286f5ed81cf5d1 489M 1.72T 494M legacy
rpool/ROOT/pve-1/50mx50gq6qgliyw0gm3361lom 454M 1.72T 948M legacy
rpool/ROOT/pve-1/5a3274fb69d2ca8214e818951dfebf55bfac0b3d25da4b20258dcdcbce336a92 208K 1.72T 50.6M legacy
rpool/ROOT/pve-1/5b3c3l1c2ggltt8ikef4o1izm 454M 1.72T 948M legacy
rpool/ROOT/pve-1/6387c3c4d826aad021c7687a4408da4e877c3794f4da6d6b58fd141104f6478a 37.7M 1.72T 92.0M legacy
rpool/ROOT/pve-1/674670258d0d916b180975b50b63fcda00b7147bfaaa3f6bab9df23a32e8b34a 136K 1.72T 331M legacy
rpool/ROOT/pve-1/70kah8vs9uwolmwcaweh1qj8b 48.4M 1.72T 166M legacy
rpool/ROOT/pve-1/7370cdf3f263832c8dc8888a04547ba216092a2427ff8c33c31eaf734e086ef3 113M 1.72T 118M legacy
rpool/ROOT/pve-1/7ce44c40dd245ca5018da186e0c2248d03529b4d5a12da3fffc0293666d7f1a2 116K 1.72T 59.8M legacy
rpool/ROOT/pve-1/80ryqhxxu7ioeamr1lo32mbym 104K 1.72T 948M legacy
rpool/ROOT/pve-1/82244987b33690131ba1cac7063e79b0b718d76147dba0a229e566b3488b87c0 124K 1.72T 92.0M legacy
rpool/ROOT/pve-1/8rypkkfqiqr38uc5r0vz8ifoa 976K 1.72T 948M legacy
rpool/ROOT/pve-1/9dce865be385fa61f2ff8e31f0d33b71e66a1f66d54918d9c725a9385ffc6a01 5.18M 1.72T 5.18M legacy
rpool/ROOT/pve-1/a80749c6e93548f9c255be298974c8cb2823f920a62b0ff18b820837f3b2dce3 116K 1.72T 92.0M legacy
rpool/ROOT/pve-1/ac2ql0lotk1n9jyyrl3ewgjvp 452K 1.72T 166M legacy
rpool/ROOT/pve-1/ac4730c95f66107c2de589bcdd4ef00355aaee36e81306673de5daaeb516b3cb 55.7M 1.72T 55.7M legacy
rpool/ROOT/pve-1/ae6e6e7fdd26c2428bdb5a6e8ddac39db0175a8354e72fd21ff85bed1db4c410 124K 1.72T 92.0M legacy
rpool/ROOT/pve-1/b8e81541acbd458120c62e259fcb7c4a12cdff0e658ee6d5bd04e51366021968 8.60M 1.72T 58.4M legacy
rpool/ROOT/pve-1/ba40910508a091eb519ecdacc53c9b79c04a9f71a6cfe1fe581c33adc8c1b1b1 192K 1.72T 118M legacy
rpool/ROOT/pve-1/c457b087b0a9822d0e82dc43d1af1f47dca751a8124bc8ab9ca75c55bd14c9fc 72K 1.72T 331M legacy
rpool/ROOT/pve-1/c457b087b0a9822d0e82dc43d1af1f47dca751a8124bc8ab9ca75c55bd14c9fc-init 208K 1.72T 331M legacy
rpool/ROOT/pve-1/c696afca3bd0728621369b1106bb6aa3e65a3961dd0362dac1f81705a8e313d1 104K 1.72T 331M legacy
rpool/ROOT/pve-1/c696afca3bd0728621369b1106bb6aa3e65a3961dd0362dac1f81705a8e313d1-init 208K 1.72T 331M legacy
rpool/ROOT/pve-1/c9ea6fbd0160c7c894ecbf4ee4aee21da23d364ee0ca83ab24a68a5ad07b351a 273M 1.72T 331M legacy
rpool/ROOT/pve-1/cc1311276566933be9aeb4db8afd54612cc0374e10faa25ab4af71b1bea85406 48.3M 1.72T 166M legacy
rpool/ROOT/pve-1/ddlthwikb0m8qt8ankbz7r7f3 48.4M 1.72T 166M legacy
rpool/ROOT/pve-1/e435a60756fe3f415901fd27d1c123ec9a578cf3354084893d6ac9fee331e254 116K 1.72T 59.8M legacy
rpool/ROOT/pve-1/ff0139bd476634e1f33db4bab33a6bc4d592ee28905ef0c422e9f1e339086379 2.12M 1.72T 59.8M legacy
rpool/ROOT/pve-1/i9mjd52wdo6tmanf3jxq7f8t9 360K 1.72T 118M legacy
rpool/ROOT/pve-1/mzyxsjr07wztbnbp5nrgccata 104K 1.72T 948M legacy
rpool/ROOT/pve-1/r83dlsrtjb2okgg69k7muop8p 108K 1.72T 108K legacy
rpool/ROOT/pve-1/sepixqkwscv8s1zryy2d4vryz 944K 1.72T 948M legacy
rpool/ROOT/pve-1/v6sldxioy08ldsijrguf9lwul 100K 1.72T 100K legacy
rpool/ROOT/pve-1/w0ld70mmoccmld0rbfftu22jm 72.5M 1.72T 948M legacy
rpool/ROOT/pve-1/x41avx31128d8g3bs0bwanqv3 452K 1.72T 166M legacy
rpool/ROOT/pve-1/x8b8m72kbu4x6ck7wgt939ttv 72.5M 1.72T 948M legacy
rpool/ROOT/pve-1/z8v17lsq4b02f1oqy1q97cb19 104K 1.72T 948M legacy
rpool/data 1.76T 1.72T 120K /rpool/data
rpool/data/base-9000-disk-0 2.44G 1.72T 2.44G -
rpool/data/subvol-106-disk-0 3.59G 60.4G 3.59G /rpool/data/subvol-106-disk-0
rpool/data/subvol-107-disk-0 1.38G 62.6G 1.38G /rpool/data/subvol-107-disk-0
rpool/data/subvol-111-disk-0 14.4G 49.6G 14.4G /rpool/data/subvol-111-disk-0
rpool/data/subvol-113-disk-0 2.39G 37.6G 2.39G /rpool/data/subvol-113-disk-0
rpool/data/subvol-122-disk-0 1.16G 14.8G 1.16G /rpool/data/subvol-122-disk-0
rpool/data/vm-100-disk-0 151G 1.72T 151G -
rpool/data/vm-101-disk-0 146G 1.72T 146G -
rpool/data/vm-102-disk-0 56.0G 1.72T 56.0G -
rpool/data/vm-103-disk-0 5.40G 1.72T 5.40G -
rpool/data/vm-104-disk-0 10.2G 1.72T 10.2G -
rpool/data/vm-105-disk-0 61.4G 1.72T 61.4G -
rpool/data/vm-108-disk-0 14.1G 1.72T 14.1G -
rpool/data/vm-109-disk-0 37.3G 1.72T 37.3G -
rpool/data/vm-110-disk-0 37.6G 1.72T 37.6G -
rpool/data/vm-112-disk-0 7.19G 1.72T 7.19G -
rpool/data/vm-114-disk-0 13.0G 1.72T 13.0G -
rpool/data/vm-115-disk-0 86.2G 1.72T 86.2G -
rpool/data/vm-116-disk-0 34.1G 1.72T 34.1G -
rpool/data/vm-117-disk-0 153G 1.72T 153G -
rpool/data/vm-118-disk-0 1.63G 1.72T 1.63G -
rpool/data/vm-119-disk-0 4.70G 1.72T 4.70G -
rpool/data/vm-120-disk-0 20.2G 1.72T 20.2G -
rpool/data/vm-121-disk-0 538G 1.72T 538G -
rpool/data/vm-123-disk-0 4.68G 1.72T 4.68G -
rpool/data/vm-124-disk-0 7.81G 1.72T 7.81G -
rpool/data/vm-125-disk-0 2.23G 1.72T 2.23G -
rpool/data/vm-126-disk-0 23.7G 1.72T 23.7G -
rpool/data/vm-127-disk-0 2.25G 1.72T 2.25G -
rpool/data/vm-128-disk-0 29.9G 1.72T 29.9G -
rpool/data/vm-129-disk-0 132K 1.72T 132K -
rpool/data/vm-129-disk-1 18.5G 1.72T 18.5G -
rpool/data/vm-129-disk-2 68K 1.72T 68K -
rpool/data/vm-130-disk-0 2.21G 1.72T 2.21G -
rpool/data/vm-131-disk-0 4.11G 1.72T 4.11G -
rpool/data/vm-132-disk-0 37.6G 1.72T 37.6G -
rpool/data/vm-133-disk-0 4.83G 1.72T 4.83G -
rpool/data/vm-134-disk-0 3.92G 1.72T 3.92G -
rpool/data/vm-135-disk-0 13.0G 1.72T 13.0G -
rpool/data/vm-136-disk-0 28.2G 1.72T 28.2G -
rpool/data/vm-137-disk-0 28.8G 1.72T 28.8G -
rpool/data/vm-138-disk-0 25.9G 1.72T 25.9G -
rpool/data/vm-139-disk-0 14.9G 1.72T 14.9G -
rpool/data/vm-140-disk-0 15.2G 1.72T 15.2G -
rpool/data/vm-141-disk-0 3.42G 1.72T 3.42G -
rpool/data/vm-142-disk-0 9.05G 1.72T 9.05G -
rpool/data/vm-143-disk-0 9.33G 1.72T 9.33G -
rpool/data/vm-144-disk-0 8.93G 1.72T 8.93G -
rpool/data/vm-145-disk-0 8.77G 1.72T 8.77G -
rpool/data/vm-146-disk-0 8.55G 1.72T 8.55G -
rpool/data/vm-147-disk-0 9.46G 1.72T 9.46G -
rpool/data/vm-148-disk-0 8.90G 1.72T 8.90G -
rpool/data/vm-149-disk-0 8.91G 1.72T 8.91G -
rpool/data/vm-150-disk-0 5.77G 1.72T 5.77G -
rpool/data/vm-200-disk-0 2.89G 1.72T 2.89G -
rpool/data/vm-310-disk-0 17.9G 1.72T 17.9G -
rpool/data/vm-311-disk-0 13.7G 1.72T 13.7G -
rpool/data/vm-312-disk-0 13.3G 1.72T 13.3G -
Do we need to collect metrics for all datasets (filesystem, snapshot, volume, bookmark, and probably there is more)? Or only for some of them?
- See man zfs-list. It has -o property.
- See man zfsprops for the list of properties that we can query.
Can you share your zfs list
output and what you expect to be collected?
I currently have 3 pools as per the following: naspool rpool vmpool
Personally I dont think you would need to capture the data for each snapshot (others may differ on this)....think is is too much data to represent. Saying that you could have it as an option in netdata.conf
zfs list -t snapshot -o name,used
Although I did try to summarise the snapshot 'used' total and snapshot count per ZFS pool without much luck.
For the 'used' data field I think that any ZFS mountpoint already picked up by Netdata should have correct data in the 'used' field, including subvol's
Running the following command will isolate the 'Used' space for the specified mountpoint. The -H omits the header.
zfs list **mountpoint** -H -o used
EG:
# zfs list /mnt/naspool -H -o used
1.09T
So if we need pools only (at least for the initial implementation) we can use 0 depth (-d
). The command would be zfs list -d 0 -H -p -o name,used,available
yep - that would do it :)
Just my two cents, and not an expert on ZFS myself, but my expectation based on what I do know would be that:
- The per-filesystem metrics correctly track data usage of the associated ZFS filesystem (what this issue is asking for). The one reservation I have about this is whether the proposed change matches up with how ZFS quotas track data usage or not (I strongly feel we should match quota behavior here).
- The per-filesystem metrics properly account available space as how much space the associated ZFS filesystem could take up given the current settings of the zpool. At minimum this should be clamped to any hard quota on the ZFS filesystem in question (if a given ZFS filesystem has a hard quota of 16 GiB, then it doesn’t matter whether the zpool is 16 GiB or 16 TiB, ZFS will not let that filesystem use more than 16 GiB of space). Ideally it would also account for space usage of other datasets within the zpool, but that’s unfortunately likely to be too difficult for us to track accurately.
- Independent of this, we need some way to see comprehensive per-zpool utilization breakdowns, which I don’t think we provide at the moment. People use ZFS as a volume manager, ergo we need to allow tracking data usage like for a volume manager (but then, we also need similar handling for all types of thinly-provisioned storage, not just ZFS, so not a huge loss right now).
Just trying out Netdata as an hopeful replacement to the whole VictoriaMetrics / Grafana / Telegraf / etc stack.. but it indeed does not seem to properly support monitoring of ZFS pools..
Display of the available space on the default Disk Space Usage
graph is very misleading for a ZFS host.. indicating I apparently have 1TB of free space on a 210GB pool.
NAME USED AVAIL REFER MOUNTPOINT
data 4.72G 210G 96K /data
data/ROOT 4.31G 210G 96K /data/ROOT
data/ROOT/pve-1 4.31G 210G 3.18G /
data/vms 79.6M 210G 96K /data/vms
data/vms/vm-101-disk-0 79.5M 210G 75.6M -
Filesystem Size Used Avail Use% Mounted on
data 211G 128K 211G 1% /data
data/ROOT 211G 128K 211G 1% /data/ROOT
data/ROOT/pve-1 214G 3.2G 211G 2% /
data/vms 211G 128K 211G 1% /data/vms
@scr4tchy This chart is the mountpoint metrics (diskspace.plugin): for every (not filtered out) mounpoint from /proc/self/mountinfo
Netdata does statfs(), so the chart should reflect df -h
output. It is misleading for ZFS, but (again) this chart shows mountpoints metrics. We have no ZFS pools space usage metrics for now (this feature request) - we need to implement this, thanks for pinging @scr4tchy
I need to figure out a few things.
I see that using zfs list will do. There are datasets and their children. Controlled by -d
-d depth
Recursively display any children of the dataset, limiting the recursion to depth. A depth of 1 will display only the dataset and its direct children.
There are a lot of properties, controlled by -o
-o property
A comma-separated list of properties to display.
So I need to understand the following:
-
do we need depth > 0 (children)? I guess yes, it should be a configuration option and the default should be 0. Or 1 :man_shrugging:
-
what properties (metrics) we need. And I think it depends on the type - for instance for datasets we need "used" and "available", for children "available" doesn't make sense because it is the parent dataset available
-
depth 0
$ zfs list -d 0 -o name,used,available,type,compressratio,type
NAME USED AVAIL TYPE RATIO TYPE
rpool 1.25T 2.26T filesystem 1.32x filesystem
- depth 2
$ zfs list -d 2 -o name,used,available,type,compressratio,type
NAME USED AVAIL TYPE RATIO TYPE
rpool 1.25T 2.26T filesystem 1.32x filesystem
rpool/ROOT 39.5G 2.26T filesystem 1.30x filesystem
rpool/ROOT/pve-1 39.5G 2.26T filesystem 1.30x filesystem
rpool/data 1.21T 2.26T filesystem 1.33x filesystem
rpool/data/base-162-disk-0 4.29G 2.26T volume 1.28x volume
rpool/data/vm-100-disk-0 148G 2.26T volume 1.20x volume
rpool/data/vm-102-disk-0 3.97G 2.26T volume 1.49x volume
rpool/data/vm-103-disk-0 8.12G 2.26T volume 1.25x volume
rpool/data/vm-105-disk-0 204G 2.26T volume 1.21x volume
rpool/data/vm-109-disk-0 66.6G 2.26T volume 1.18x volume
rpool/data/vm-110-disk-0 40.7G 2.26T volume 1.14x volume
rpool/data/vm-115-disk-0 86.2G 2.26T volume 1.14x volume
rpool/data/vm-116-disk-0 38.6G 2.26T volume 1.45x volume
rpool/data/vm-117-disk-0 26.9G 2.26T volume 1.27x volume
rpool/data/vm-118-disk-0 4.50G 2.26T volume 1.24x volume
rpool/data/vm-119-disk-0 5.46G 2.26T volume 1.22x volume
rpool/data/vm-123-disk-0 5.30G 2.26T volume 1.19x volume
rpool/data/vm-124-disk-0 9.17G 2.26T volume 1.19x volume
rpool/data/vm-125-disk-0 2.29G 2.26T volume 1.39x volume
rpool/data/vm-126-disk-0 28.0G 2.26T volume 1.15x volume
rpool/data/vm-127-disk-0 6.60G 2.26T volume 1.25x volume
rpool/data/vm-128-disk-0 65.2G 2.26T volume 1.34x volume
rpool/data/vm-129-disk-0 100K 2.26T volume 8.23x volume
rpool/data/vm-129-disk-1 27.2G 2.26T volume 1.13x volume
rpool/data/vm-129-disk-2 68K 2.26T volume 1.10x volume
rpool/data/vm-130-disk-0 3.10G 2.26T volume 1.16x volume
rpool/data/vm-131-disk-0 6.55G 2.26T volume 1.18x volume
rpool/data/vm-133-disk-0 5.64G 2.26T volume 1.18x volume
rpool/data/vm-136-disk-0 61.5G 2.26T volume 1.34x volume
rpool/data/vm-137-disk-0 55.1G 2.26T volume 1.27x volume
rpool/data/vm-138-disk-0 52.5G 2.26T volume 1.27x volume
rpool/data/vm-140-disk-0 23.0G 2.26T volume 1.16x volume
rpool/data/vm-141-disk-0 11.2G 2.26T volume 1.17x volume
rpool/data/vm-145-disk-0 5.38G 2.26T volume 1.43x volume
rpool/data/vm-150-disk-0 6.07G 2.26T volume 1.34x volume
rpool/data/vm-151-disk-0 7.48G 2.26T volume 1.21x volume
rpool/data/vm-156-disk-0 33.8G 2.26T volume 1.82x volume
rpool/data/vm-157-disk-0 36.5G 2.26T volume 1.71x volume
rpool/data/vm-158-disk-0 33.6G 2.26T volume 1.84x volume
rpool/data/vm-161-disk-0 5.44G 2.26T volume 1.45x volume
rpool/data/vm-163-disk-0 6.36G 2.26T volume 1.26x volume
rpool/data/vm-200-disk-0 3.31G 2.26T volume 1.33x volume
rpool/data/vm-310-disk-0 34.2G 2.26T volume 1.83x volume
rpool/data/vm-311-disk-0 34.7G 2.26T volume 1.81x volume
rpool/data/vm-312-disk-0 36.0G 2.26T volume 1.74x volume
This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:
https://community.netdata.cloud/t/zfs-zpool-and-zfs-the-size/2145/5
Ok, for pools we need to use zpool list
$ zpool list -o name,allocated,capacity,dedupratio,fragmentation,free,size
NAME ALLOC CAP DEDUP FRAG FREE SIZE
rpool 1.25T 34% 1.00x 44% 2.37T 3.62T
I think we can start with it and add zfs list
(ZFS datasets) later when will have a better understanding.
For me, overall pool size vs pool usage (aka overall free space) is the most important. Also to be able to define the usual alarms for these.
I don't use quotas, I guess if you do, dataset quota vs % of quota used is relevant.
But that might get complicated since there are user quotas, group quotas etc.
Has anyone taken a stab at implementing this?
I am very new to Netdata, but it looks like this would be a good thing to implement with a Go collector, right? I'm willing to give it a shot if nobody has anything in progress.
I added the initial version that uses zpool list
in #17367.