operator
operator copied to clipboard
Fix VMRule validation webhook
Hey
Wrong variables in rule's anotations are corner case in validating VMRule that will cause configuration parse failure and fatal log and CrashLoopBackOff in the vmalert pod. invalid vmrule that will be pass the operator's sanity check:
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
spec:
groups:
- name: invalid-group
rules:
- alert: invalid-rule
annotations:
summary: "Hey This works! {{$lables.namespace}}"
description: This a dummy for test pupose in case of emergency.
expr: delta(vmalert_config_last_reload_errors_total[5m]) > 0
for: 10s
lables instead of labels. and when it get mount in the vmalert config:
{"ts":"2023-12-10T12:07:55.288Z","level":"fatal","caller":"VictoriaMetrics/app/vmalert/main.go:171","msg":"cannot parse configuration file: failed to parse [/etc/vmalert/config/vm-vmalert-test-rulefiles-0/*.yaml]: errors(1): invalid group \"invalid-group\" in file \"/etc/vmalert/config/vm-vmalert-test-rulefiles-0/monitoring-system-invalid-vmrule.yaml\": invalid annotations for rule \"invalid-group\".\"invalid-rule\": errors(1): key \"summary\", template \"Hey This works! {{$lables.job}}\": error parsing annotation template: template: :1: undefined variable \"$lables\""}
As I have check the operator webhook and vmalert, vmalert uses and extra validator: https://github.com/VictoriaMetrics/VictoriaMetrics/blob/6d037798705786ed7a3b697575db7489c4f9bb36/app/vmalert/main.go#L144C45-L144C58
I think the purpose of validation webhook was to guarantee if an rule could be applied, it wont make trouble on vmalert, cause in this case vmalert stays on last valid confugarion and new rules and updates won't change vmalert configuration.
@Haleygo Can you take a look? I guess, we should improve our validation webhook for vmrule.