prometheus-pve-exporter icon indicating copy to clipboard operation
prometheus-pve-exporter copied to clipboard

pve_disk_usage_bytes on all qemu guest is null

Open SaymonDzen opened this issue 3 years ago • 8 comments

on all qemu vm is null, lxc - ок

ON pve 6 or 7. Cluster or one node.

pip list --local
Package           Version
----------------- -------
prometheus-client 0.11.0
proxmoxer         1.1.1
Werkzeug          2.0.1
# HELP pve_disk_usage_bytes Disk usage in bytes
# TYPE pve_disk_usage_bytes gauge
pve_disk_usage_bytes{id="qemu/136"} 0.0
pve_disk_usage_bytes{id="lxc/128"} 2.1141504e+09
pve_disk_usage_bytes{id="lxc/119"} 0.0
pve_disk_usage_bytes{id="lxc/120"} 4.493533184e+09
pve_disk_usage_bytes{id="qemu/126"} 0.0
pve_disk_usage_bytes{id="lxc/100"} 6.127427584e+09
pve_disk_usage_bytes{id="lxc/106"} 6.14606848e+09
pve_disk_usage_bytes{id="qemu/114"} 0.0
pve_disk_usage_bytes{id="lxc/108"} 3.995119616e+09
pve_disk_usage_bytes{id="qemu/143"} 0.0
pve_disk_usage_bytes{id="lxc/103"} 4.350595072e+09
pve_disk_usage_bytes{id="qemu/111"} 0.0
pve_disk_usage_bytes{id="lxc/115"} 4.468477952e+0

SaymonDzen avatar Aug 23 '21 15:08 SaymonDzen

Thanks for taking the time to report this issue. Regrettably this is a limitation in Proxmox VE itself. This information would only be available if PVE would query the guest-agent running inside the VM. And that is not implemented in PVE.

https://github.com/proxmox/qemu-server/blob/104f47a9f8d584d93a238fa27f56749b9b31ea5a/PVE/QemuServer.pm#L2697-L2704

So regrettably we cannot do anything about that. Also this is a dupe of #45

znerol avatar Aug 23 '21 18:08 znerol

@znerol Maybe this can help?

while the guest agent can show information about the disks now, it is non trivial to map that to a specific disk in the config (raid/partitions/etc)

what you can do is using the guest agent api yourself: https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/qemu/{vmid}/agent/get-fsinfo this will return json from the guest agent (if its installed and new enough version)

SaymonDzen avatar Dec 03 '21 14:12 SaymonDzen

Thanks for the info. I tend to think that if you are in the position to run the guest-agent, then you also might just run prometheus-node-exporter directly in the guest. But maybe I am missing a use case here.

znerol avatar Dec 03 '21 15:12 znerol

The case is such that the guest agent is installed in the template of those VMs that I am deploying for our developers. And when deploying a new VM, in order to receive a warning about the lack of space, I do not need to configure the prometheus for each new VM. In my opinion, this is very convenient, for example, when looking at one board, you can immediately see the problem VMs. With LXC, it is now convenient.

SaymonDzen avatar Dec 03 '21 15:12 SaymonDzen

If there is a clean and straight forward way to implement this, then I do welcome a PR. Leaving this open, so people will find the issue.

znerol avatar Dec 03 '21 15:12 znerol

Hi,

There are 2 ways to get the data: From the pve: for node in $(pvesh get /nodes --output-format json | jq -r '.[].node'); do (pvesh get /nodes/${node}/qemu --output-format json; pvesh get /nodes/${node}/lxc --output-format json) | jq -r '.[]'; done;

From the lv: lvs --reportformat json pve | jq '.report[0].lv[] | "\(.lv_name), \(.lv_size), \(.data_percent)"' Taking the info from the lv is way faster, but has several drawbacks:

  • It only shows the "high water mark" of the volume, and does not reflect the real qemu disk usage.
  • It only works with lvm thin provisioned VMs
  • There's no mapping with the actual vmid, and especially if the vm uses local storage and has been migrated to another host, it might show an old unused volume.

The first solution seems the easiest path, since it should provide the data from the agent as displayed on the PVE GUI. This only works if the qemu-guest-agent is installed in the VM.

In the meantime and with some output adaptation, there must be a way to schedule the first solution with cron to feed a local prometheus 'textfile' exporter (/var/lib/prometheus/node-exporter/README.textfile).

This line should be a good start if find a way to change the 'null' by 'qemu':

root@pve1:~# for node in $(pvesh get /nodes --output-format json | jq -r '.[].node'); do (pvesh get /nodes/${node}/qemu --output-format json; pvesh get /nodes/${node}/lxc --output-format json) | jq -r '.[] | "pve_disk_usage_bytes{id=\"\(.type)/\(.vmid)\"} \(.maxdisk) "'; done;
pve_disk_usage_bytes{id="null/113"} 8589934592 
pve_disk_usage_bytes{id="null/139"} 16106127360 
pve_disk_usage_bytes{id="null/109"} 10737418240 
pve_disk_usage_bytes{id="null/129"} 34359738368 
pve_disk_usage_bytes{id="lxc/203"} 8350298112 
pve_disk_usage_bytes{id="lxc/107"} 12884901888 
pve_disk_usage_bytes{id="lxc/122"} 4143677440 
pve_disk_usage_bytes{id="lxc/112"} 10464022528 
pve_disk_usage_bytes{id="null/100"} 34359738368 
pve_disk_usage_bytes{id="null/111"} 8589934592 
pve_disk_usage_bytes{id="lxc/110"} 10464022528 
pve_disk_usage_bytes{id="lxc/134"} 41956900864

I did not check the exact meaning and units of maxdisk, but that's the idea.

Pivert avatar Mar 16 '22 11:03 Pivert

Looping through nodes has led to issues in the past.

I guess that in order to resolve the issue properly, we'd need to fix this directly in the pve api (i.e., server side). For Perl/Raku hackers, this is the place to start looking.

znerol avatar Mar 19 '22 08:03 znerol

were you able to find a solution?

kingp0dd avatar Mar 26 '24 13:03 kingp0dd