monitoring
                                
                                 monitoring copied to clipboard
                                
                                    monitoring copied to clipboard
                            
                            
                            
                        `StalePersistentVolumeClaim` duplicate time series in Victoria Metrics
Hi,
The StalePersistentVolumeClaim query here breaks Victoria Metrics in some configurations:
https://github.com/openebs/monitoring/blob/161b5af7f7525c223b9165a13db2d6b667d08aad/deploy/charts/rules/volume/volume-rules.json#L12
422: error when executing query="kube_persistentvolumeclaim_info unless (kube_persistentvolumeclaim_info * on(persistentvolumeclaim) group_right kube_pod_spec_volumes_persistentvolumeclaims_info) == 1" on the time range (start=1700931915000, end=1700932215000, step=15000): cannot execute "kube_persistentvolumeclaim_info unless ((kube_persistentvolumeclaim_info * on(persistentvolumeclaim) group_right() kube_pod_spec_volumes_persistentvolumeclaims_info) == 1)": cannot execute "(kube_persistentvolumeclaim_info{persistentvolumeclaim=~\"audit-vault-0|audit-vault-1|audit-vault-2|... duplicate time series on the left side of `* on(persistentvolumeclaim) group_right()`: ...
In our particular case, this happens on any ReadWriteMany PVC that occurs in more than one pod in the same namespace.
Request community to help here.
@avishnu I think I'm seeing the exact same issue when using renovate with a persistent cache. I think the problem is, that renovate's CronJob produces several pods each referencing the same PVC, which leads to a many-to-many situation that prometheus doesn't support.
I've changed to rule to this:
kube_persistentvolumeclaim_info{namespace!="renovate"} unless (kube_persistentvolumeclaim_info * on (persistentvolumeclaim) group_left () (max by (persistentvolumeclaim) (kube_pod_spec_volumes_persistentvolumeclaims_info))) == 1
The main change is, that I replaced group_left () kube_pod_spec_volumes_persistentvolumeclaims_info) with group_left () (max by (persistentvolumeclaim) (kube_pod_spec_volumes_persistentvolumeclaims_info))). This max by (persistentvolumeclaim) collapses all series into one per PVC. Not sure if that's a good way to do it, but it does work.
@dhess can you confirm that you have multiple pods referencing the same PVC? (count by (persistentvolumeclaim) (kube_pod_spec_volumes_persistentvolumeclaims_info) > 1)
This also breaks when using RWX volumes with scaled-up applications.
I created #124 with the change I suggested in my previous comment