volsync
volsync copied to clipboard
v0.8.1 - Volsync Source Alerts not clearing after mutliple trigger runs
Describe the bug
While not frequent, I have had VolSyncVolumeOutOfSync
alert raised with role="source"
that does not clear on its own. I have to restart the volsync
application pod to clear the alert.
Steps to reproduce I have a Promtheus alert defined as:
- alert: VolSyncVolumeOutOfSync
annotations:
summary: >-
{{ $labels.obj_namespace }}/{{ $labels.obj_name }} volume
is out of sync.
expr: |
volsync_volume_out_of_sync == 1
for: 15m
labels:
severity: critical
I didn't notice exactly when the alert was raised. I suspect there was a delay with the initial run due to Restic repository secret issue. But it was definitely before the job's 2nd run:
volsync_volume_out_of_sync{container="kube-rbac-proxy", endpoint="https", instance="10.42.0.186:8443", job="volsync-metrics", method="restic", namespace="volsync-system", obj_name="unifi", obj_namespace="unifi", pod="volsync-6b546cdf59-5knxk", role="source", service="volsync-metrics"} 1
Upon noticing the alert, I checked the replicationsource
, which look like initial run was fine:
Last Sync Duration: 24.08989161s
Last Sync Time: 2024-03-12T14:14:05Z
Latest Mover Status:
Logs: no parent snapshot found, will read all files
Added to the repository: 390.739 MiB (282.820 MiB stored)
processed 697 files, 944.763 MiB in 0:09
snapshot c3b827c6 saved
Restic completed in 13s
Result: Successful
Next Sync Time: 2024-03-12T16:00:00Z
And next one has not been reached yet:
$ date -u +"%Y-%m-%dT%H-%M-%SZ"
2024-03-12T15-38-06Z
I expected the alert to clear after the next run. I waited for the next run also successful, but alert does not clear:
Last Sync Duration: 52.147254329s
Last Sync Time: 2024-03-12T16:00:52Z
Latest Mover Status:
Logs: using parent snapshot c3b827c6
Added to the repository: 31.496 MiB (9.310 MiB stored)
processed 697 files, 935.255 MiB in 0:04
snapshot ee5e4fe3 saved
Restic completed in 5s
Result: Successful
Next Sync Time: 2024-03-12T20:00:00Z
Expected behavior
I was expecting the VolSyncVolumeOutOfSync
alert to clear after the next trigger run.
Actual results The alert did not clear until I manually restarted the volsync application pod. The alert immediately clears and stays cleared.
Additional context
Not sure what is relevant in the volsync pod log. These are logs filtered on keyword unifi
which had the raised alert, before being restarted:
2024-03-12T16:00:52.121Z INFO controllers.ReplicationSource job completed {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "method": "Restic", "job": {"name":"volsync-src-unifi-controller","namespace":"unifi"}}
2024-03-12T16:00:52.126Z INFO controllers.ReplicationSource Getting logs for pod {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "method": "Restic", "jobName": "volsync-src-unifi-controller", "podName": "volsync-src-unifi-controller-q6x49", "pod": {"namespace": "unifi", "name": "volsync-src-unifi-controller-q6x49"}}
2024-03-12T16:00:52.147Z DEBUG controllers.ReplicationSource transitioning to cleanup state {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}}
2024-03-12T16:00:52.168Z INFO controllers.ReplicationSource Namespace allows volsync privileged movers {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "namespace": "unifi", "Annotation": "volsync.backube/privileged-movers", "Annotation value": "true"}
2024-03-12T16:00:52.168Z INFO controllers.ReplicationSource deleting temporary objects {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "method": "Restic", "owned-by": "ada06fe5-dbdf-4b17-a5d1-52defa9e9cd7"}
2024-03-12T16:00:52.264Z INFO controllers.ReplicationSource Namespace allows volsync privileged movers {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "namespace": "unifi", "Annotation": "volsync.backube/privileged-movers", "Annotation value": "true"}
2024-03-12T16:00:52.264Z INFO controllers.ReplicationSource deleting temporary objects {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "method": "Restic", "owned-by": "ada06fe5-dbdf-4b17-a5d1-52defa9e9cd7"}
2024-03-12T16:00:52.305Z INFO controllers.ReplicationSource Namespace allows volsync privileged movers {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "namespace": "unifi", "Annotation": "volsync.backube/privileged-movers", "Annotation value": "true"}
2024-03-12T16:00:52.305Z INFO controllers.ReplicationSource deleting temporary objects {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "method": "Restic", "owned-by": "ada06fe5-dbdf-4b17-a5d1-52defa9e9cd7"}
2024-03-12T16:00:52.565Z INFO controllers.ReplicationSource Namespace allows volsync privileged movers {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "namespace": "unifi", "Annotation": "volsync.backube/privileged-movers", "Annotation value": "true"}
2024-03-12T16:00:52.565Z INFO controllers.ReplicationSource deleting temporary objects {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "method": "Restic", "owned-by": "ada06fe5-dbdf-4b17-a5d1-52defa9e9cd7"}
2024-03-12T16:01:00.220Z INFO controllers.ReplicationSource Namespace allows volsync privileged movers {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "namespace": "unifi", "Annotation": "volsync.backube/privileged-movers", "Annotation value": "true"}
2024-03-12T16:01:00.220Z INFO controllers.ReplicationSource deleting temporary objects {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "method": "Restic", "owned-by": "ada06fe5-dbdf-4b17-a5d1-52defa9e9cd7"}
2024-03-12T16:14:29.403Z INFO controllers.ReplicationSource Namespace allows volsync privileged movers {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "namespace": "unifi", "Annotation": "volsync.backube/privileged-movers", "Annotation value": "true"}
2024-03-12T16:14:29.403Z INFO controllers.ReplicationSource deleting temporary objects {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "method": "Restic", "owned-by": "ada06fe5-dbdf-4b17-a5d1-52defa9e9cd7"}
2024-03-12T16:14:29.403Z DEBUG events Populator finished {"type": "Normal", "object": {"kind":"PersistentVolumeClaim","namespace":"unifi","name":"unifi-controller","uid":"ed9f9bce-ee0c-48d9-a6f3-45ee74a4978a","apiVersion":"v1","resourceVersion":"535052418"}, "reason": "VolSyncPopulatorFinished"}
I restart the volsync application pod, the alert immediately clears and does not come back, this again filtered on unifi
was after restart:
2024-03-12T17:29:05.990Z DEBUG events Populator finished {"type": "Normal", "object": {"kind":"PersistentVolumeClaim","namespace":"unifi","name":"unifi-controller","uid":"ed9f9bce-ee0c-48d9-a6f3-45ee74a4978a","apiVersion":"v1","resourceVersion":"535052418"}, "reason": "VolSyncPopulatorFinished"}
2024-03-12T17:29:06.099Z INFO controllers.ReplicationSource Namespace allows volsync privileged movers {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "namespace": "unifi", "Annotation": "volsync.backube/privileged-movers", "Annotation value": "true"}
2024-03-12T17:29:06.099Z INFO controllers.ReplicationDestination Namespace allows volsync privileged movers {"replicationdestination": {"name":"unifi-controller-dst","namespace":"unifi"}, "namespace": "unifi", "Annotation": "volsync.backube/privileged-movers", "Annotation value": "true"}
2024-03-12T17:29:06.100Z INFO controllers.ReplicationSource deleting temporary objects {"replicationsource": {"name":"unifi-controller","namespace":"unifi"}, "method": "Restic", "owned-by": "ada06fe5-dbdf-4b17-a5d1-52defa9e9cd7"}
2024-03-12T17:29:06.104Z DEBUG controllers.ReplicationDestination removing snapshot annotations from pvc {"replicationdestination": {"name":"unifi-controller-dst","namespace":"unifi"}, "method": "Restic"}
2024-03-12T17:29:06.105Z INFO controllers.ReplicationDestination deleting temporary objects {"replicationdestination": {"name":"unifi-controller-dst","namespace":"unifi"}, "method": "Restic", "owned-by": "ad5c7686-eeb5-4f9b-be49-a614a3817cb2"}