monitoring icon indicating copy to clipboard operation
monitoring copied to clipboard

`StalePersistentVolumeClaim` duplicate time series in Victoria Metrics

Open dhess opened this issue 1 year ago • 1 comments

Hi,

The StalePersistentVolumeClaim query here breaks Victoria Metrics in some configurations:

https://github.com/openebs/monitoring/blob/161b5af7f7525c223b9165a13db2d6b667d08aad/deploy/charts/rules/volume/volume-rules.json#L12

422: error when executing query="kube_persistentvolumeclaim_info unless (kube_persistentvolumeclaim_info * on(persistentvolumeclaim) group_right kube_pod_spec_volumes_persistentvolumeclaims_info) == 1" on the time range (start=1700931915000, end=1700932215000, step=15000): cannot execute "kube_persistentvolumeclaim_info unless ((kube_persistentvolumeclaim_info * on(persistentvolumeclaim) group_right() kube_pod_spec_volumes_persistentvolumeclaims_info) == 1)": cannot execute "(kube_persistentvolumeclaim_info{persistentvolumeclaim=~\"audit-vault-0|audit-vault-1|audit-vault-2|... duplicate time series on the left side of `* on(persistentvolumeclaim) group_right()`: ...

In our particular case, this happens on any ReadWriteMany PVC that occurs in more than one pod in the same namespace.

dhess avatar Nov 25 '23 19:11 dhess

Request community to help here.

avishnu avatar Sep 13 '24 13:09 avishnu

@avishnu I think I'm seeing the exact same issue when using renovate with a persistent cache. I think the problem is, that renovate's CronJob produces several pods each referencing the same PVC, which leads to a many-to-many situation that prometheus doesn't support.

I've changed to rule to this:

kube_persistentvolumeclaim_info{namespace!="renovate"} unless (kube_persistentvolumeclaim_info * on (persistentvolumeclaim) group_left () (max by (persistentvolumeclaim) (kube_pod_spec_volumes_persistentvolumeclaims_info))) == 1

The main change is, that I replaced group_left () kube_pod_spec_volumes_persistentvolumeclaims_info) with group_left () (max by (persistentvolumeclaim) (kube_pod_spec_volumes_persistentvolumeclaims_info))). This max by (persistentvolumeclaim) collapses all series into one per PVC. Not sure if that's a good way to do it, but it does work.

@dhess can you confirm that you have multiple pods referencing the same PVC? (count by (persistentvolumeclaim) (kube_pod_spec_volumes_persistentvolumeclaims_info) > 1)

This also breaks when using RWX volumes with scaled-up applications.

pschichtel avatar Dec 09 '24 22:12 pschichtel

I created #124 with the change I suggested in my previous comment

pschichtel avatar Dec 14 '24 13:12 pschichtel