prometheus-pve-exporter
prometheus-pve-exporter copied to clipboard
pve_disk_usage_bytes on all qemu guest is null
on all qemu vm is null, lxc - ок
ON pve 6 or 7. Cluster or one node.
pip list --local
Package Version
----------------- -------
prometheus-client 0.11.0
proxmoxer 1.1.1
Werkzeug 2.0.1
# HELP pve_disk_usage_bytes Disk usage in bytes
# TYPE pve_disk_usage_bytes gauge
pve_disk_usage_bytes{id="qemu/136"} 0.0
pve_disk_usage_bytes{id="lxc/128"} 2.1141504e+09
pve_disk_usage_bytes{id="lxc/119"} 0.0
pve_disk_usage_bytes{id="lxc/120"} 4.493533184e+09
pve_disk_usage_bytes{id="qemu/126"} 0.0
pve_disk_usage_bytes{id="lxc/100"} 6.127427584e+09
pve_disk_usage_bytes{id="lxc/106"} 6.14606848e+09
pve_disk_usage_bytes{id="qemu/114"} 0.0
pve_disk_usage_bytes{id="lxc/108"} 3.995119616e+09
pve_disk_usage_bytes{id="qemu/143"} 0.0
pve_disk_usage_bytes{id="lxc/103"} 4.350595072e+09
pve_disk_usage_bytes{id="qemu/111"} 0.0
pve_disk_usage_bytes{id="lxc/115"} 4.468477952e+0
Thanks for taking the time to report this issue. Regrettably this is a limitation in Proxmox VE itself. This information would only be available if PVE would query the guest-agent running inside the VM. And that is not implemented in PVE.
https://github.com/proxmox/qemu-server/blob/104f47a9f8d584d93a238fa27f56749b9b31ea5a/PVE/QemuServer.pm#L2697-L2704
So regrettably we cannot do anything about that. Also this is a dupe of #45
@znerol Maybe this can help?
while the guest agent can show information about the disks now, it is non trivial to map that to a specific disk in the config (raid/partitions/etc)
what you can do is using the guest agent api yourself: https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/qemu/{vmid}/agent/get-fsinfo this will return json from the guest agent (if its installed and new enough version)
Thanks for the info. I tend to think that if you are in the position to run the guest-agent
, then you also might just run prometheus-node-exporter
directly in the guest. But maybe I am missing a use case here.
The case is such that the guest agent is installed in the template of those VMs that I am deploying for our developers. And when deploying a new VM, in order to receive a warning about the lack of space, I do not need to configure the prometheus for each new VM. In my opinion, this is very convenient, for example, when looking at one board, you can immediately see the problem VMs. With LXC, it is now convenient.
If there is a clean and straight forward way to implement this, then I do welcome a PR. Leaving this open, so people will find the issue.
Hi,
There are 2 ways to get the data:
From the pve:
for node in $(pvesh get /nodes --output-format json | jq -r '.[].node'); do (pvesh get /nodes/${node}/qemu --output-format json; pvesh get /nodes/${node}/lxc --output-format json) | jq -r '.[]'; done;
From the lv:
lvs --reportformat json pve | jq '.report[0].lv[] | "\(.lv_name), \(.lv_size), \(.data_percent)"'
Taking the info from the lv is way faster, but has several drawbacks:
- It only shows the "high water mark" of the volume, and does not reflect the real qemu disk usage.
- It only works with lvm thin provisioned VMs
- There's no mapping with the actual vmid, and especially if the vm uses local storage and has been migrated to another host, it might show an old unused volume.
The first solution seems the easiest path, since it should provide the data from the agent as displayed on the PVE GUI. This only works if the qemu-guest-agent is installed in the VM.
In the meantime and with some output adaptation, there must be a way to schedule the first solution with cron to feed a local prometheus 'textfile' exporter (/var/lib/prometheus/node-exporter/README.textfile).
This line should be a good start if find a way to change the 'null' by 'qemu':
root@pve1:~# for node in $(pvesh get /nodes --output-format json | jq -r '.[].node'); do (pvesh get /nodes/${node}/qemu --output-format json; pvesh get /nodes/${node}/lxc --output-format json) | jq -r '.[] | "pve_disk_usage_bytes{id=\"\(.type)/\(.vmid)\"} \(.maxdisk) "'; done;
pve_disk_usage_bytes{id="null/113"} 8589934592
pve_disk_usage_bytes{id="null/139"} 16106127360
pve_disk_usage_bytes{id="null/109"} 10737418240
pve_disk_usage_bytes{id="null/129"} 34359738368
pve_disk_usage_bytes{id="lxc/203"} 8350298112
pve_disk_usage_bytes{id="lxc/107"} 12884901888
pve_disk_usage_bytes{id="lxc/122"} 4143677440
pve_disk_usage_bytes{id="lxc/112"} 10464022528
pve_disk_usage_bytes{id="null/100"} 34359738368
pve_disk_usage_bytes{id="null/111"} 8589934592
pve_disk_usage_bytes{id="lxc/110"} 10464022528
pve_disk_usage_bytes{id="lxc/134"} 41956900864
I did not check the exact meaning and units of maxdisk, but that's the idea.
Looping through nodes has led to issues in the past.
I guess that in order to resolve the issue properly, we'd need to fix this directly in the pve api (i.e., server side). For Perl/Raku hackers, this is the place to start looking.
were you able to find a solution?