scylla-operator
scylla-operator copied to clipboard
Scylla Manager controller will update tasks despite no changes in spec
What happened?
Scylla Manager controller decides to update the tasks defined in ScyllaCluster's spec by checking deep equality between the definition and the task obtained from the Manager's state.
https://github.com/scylladb/scylla-operator/blob/f2336ee228b4132081a179c5b8b9976a6d725c7e/pkg/controller/manager/sync_action.go#L157
Since some fields are converted when translating them to requests to Scylla Manager, but not when converting them back, the deep equality will always be false in some cases. This in turn means that tasks can be updated indefinitely in a loop, despite their specification not changing. This causes superfluous, additional load to Scylla Manager and the controller.
The same situation can also be caused by the Manager defaulting some fields or not returning their values in API call responses.
Example logs:
I0313 12:00:21.129683 1 manager/sync.go:134] "Executing action" action="add task &{ClusterID: Enabled:true ID: Name:weekly Properties:map[intensity:1 parallel:1 small_table_threshold:1073741824] Schedule:0xc00069d5e0 Tags:[] Type:repair}"
...
I0313 12:00:21.291902 1 manager/sync.go:93] "Started syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-vkzsg-nwz28/basic-55h26" startTime="2024-03-13 12:00:21.291882268 +0000 UTC m=+1712.897695871"
I0313 12:00:21.306972 1 manager/sync.go:134] "Executing action" action="update task &{ClusterID: Enabled:true ID:c0aa282b-63ef-4dc5-87c3-475e3dcec9e0 Name:weekly Properties:map[intensity:1 parallel:1 small_table_threshold:1073741824] Schedule:0xc0000b20e0 Tags:[] Type:repair}"
I0313 12:00:21.483593 1 manager/sync.go:95] "Finished syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-vkzsg-nwz28/basic-55h26" duration="191.693011ms"
...
I0313 12:03:22.862395 1 manager/sync.go:93] "Started syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-vkzsg-nwz28/basic-55h26" startTime="2024-03-13 12:03:22.862358661 +0000 UTC m=+1894.468172253"
I0313 12:03:22.885635 1 manager/sync.go:134] "Executing action" action="update task &{ClusterID: Enabled:true ID:c0aa282b-63ef-4dc5-87c3-475e3dcec9e0 Name:weekly Properties:map[intensity:1 parallel:1 small_table_threshold:1073741824] Schedule:0xc0002448c0 Tags:[] Type:repair}"
...
I0313 12:04:41.037507 1 manager/sync.go:93] "Started syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-vkzsg-nwz28/basic-55h26" startTime="2024-03-13 12:04:41.037464223 +0000 UTC m=+1972.643277820"
I0313 12:04:41.058417 1 manager/sync.go:134] "Executing action" action="update task &{ClusterID: Enabled:true ID:c0aa282b-63ef-4dc5-87c3-475e3dcec9e0 Name:weekly Properties:map[intensity:1 parallel:1 small_table_threshold:1073741824] Schedule:0xc000244070 Tags:[] Type:repair}"
I0313 12:04:41.201111 1 manager/sync.go:95] "Finished syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-vkzsg-nwz28/basic-55h26" duration="163.634093ms"
I0313 12:04:41.201153 1 manager/sync.go:93] "Started syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-vkzsg-nwz28/basic-55h26" startTime="2024-03-13 12:04:41.201142592 +0000 UTC m=+1972.806956179"
I0313 12:04:41.223481 1 manager/sync.go:134] "Executing action" action="update task &{ClusterID: Enabled:true ID:c0aa282b-63ef-4dc5-87c3-475e3dcec9e0 Name:weekly Properties:map[intensity:1 parallel:1 small_table_threshold:1073741824] Schedule:0xc0002444d0 Tags:[] Type:repair}"
I0313 12:04:41.367677 1 manager/sync.go:95] "Finished syncing ScyllaCluster" ScyllaCluster="e2e-test-scyllacluster-vkzsg-nwz28/basic-55h26" duration="166.520159ms"
In the above scenario the infinite updates come from the discrepancy of small_table_threshold
value between ScyllaCluster's spec and the Manager's state, due to the value being converted before sending the request.
What did you expect to happen?
The tasks should not be updated when there are no changes in their spec.
How can we reproduce it (as minimally and precisely as possible)?
Schedule any task using ScyllaCluster's API.
Scylla Operator version
master
Kubernetes platform name and version
n/a
Please attach the must-gather archive.
n/a
Anything else we need to know?
No response