Adding an invalid label causes whole cluster to be removed
Operator version: 0.23.5
Adding an invalid label to a podtemplate, eventually causes the operator to delete all statefulsets during reconciliation, regardless of settings.
I have the following settings:
runtime:
reconcileCHIsThreadsNumber: 10
reconcileShardsThreadsNumber: 5
reconcileShardsMaxConcurrencyPercent: 50
threadsNumber: 0
statefulSet:
create:
onFailure: abort
update:
timeout: 300
pollInterval: 5
onFailure: rollback
host:
wait:
exclude: "true"
queries: "true"
include: "false"
After adding an invalid label to spec.templates.podTemplates[0].metadata.label e.g. some_bad_label: '/metrics' the operator tries to recreate the statefulsets but encounters the following error:
E0508 13:36:35.377188 1 creator.go:46] createStatefulSet():StatefulSet create failed. err: StatefulSet.apps "chi-clickhouse-store-0-0" is invalid: spec.template.labels: Invalid value: "/metrics": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')
My expected behavior is: After failing to create the statefulset the operator either aborts or rolls back
Actual behavior: After some time period, the operator moves to the next statefulset until all are deleted (and not recreated due to error)
Please check these behaviors: https://github.com/Altinity/clickhouse-operator/blob/bbbf66a8e0fbbcf36b787a63eceeaca37e0ec272/config/config.yaml#L256
Try to modify
update:
onFailure: rollback
to
update:
onFailure: abort
rollback needs to be checked
@sunsingerus
I checked this with
update:
onFailure: abort
and I still get the exact same behavior. After ~15mins it continues.