strimzi-kafka-operator
strimzi-kafka-operator copied to clipboard
[Bug]: Do reconciliation and rolling update on pod (Cruise Control) even if only secretKeyRef changed
Bug Description
We can deploy Kafka with Cruise Control e.g. from the following example: https://github.com/strimzi/strimzi-kafka-operator/blob/main/examples/cruise-control/kafka-cruise-control-with-goals.yaml
If we would like to enable the Basic auth for Cruise Control, then the following section should be added to the yaml descriptor:
#...
kind: Kafka
spec:
# ...
cruiseControl:
# ...
config:
webserver.security.enable: true
# ...
apiUsers:
type: hashLoginService
valueFrom:
secretKeyRef:
name: cruise-control-api-users-secret
key: cruise-control-auth.txt
# ...
When I modified the name or the key of the secret in the secretKeyRef, e.g.:
#...
secretKeyRef:
name: cruise-control-api-users-secret-2
key: cruise-control-auth-2.txt
# ...
then reconciliation loop was not able to recognize the change/modification and Cruise Control pod was NOT rolling updated, so it was still running with the old secret.
When I modified one of the 'spec.cruiseControl.config' goal section, then reconciliation happened, and Cruise Control was rolling updated, which is the expected behaviour
Steps to reproduce
- Create a cluster-operator
- Create secret for Cruise Control auth (e.g.:
userOne: passwordOne, USER) - Deploy Kafka with CC: https://github.com/strimzi/strimzi-kafka-operator/blob/main/examples/cruise-control/kafka-cruise-control-with-goals.yaml, but with
apiUserssection and with 'hashLoginService' - Wait for the cluster to be ready
- Create a new secret (name or key or both of them should be different from the secret of the 2nd step)
- Edit kafka-cruise-control-with-goals.yaml to use the new secret, but nothing else
- Wait for reconciliation and CC rolling update (This will not happen!)
Expected behavior
Cruise Control pod should be reconciled even when only a secret ref is changed. Furthermore we should check this behaviour for other type of resources as well (instead of Cruise Control)
Strimzi version
main, 0.43.0
Kubernetes version
1.30.1
Installation method
yaml files + helm chart
Infrastructure
No response
Configuration files and logs
No response
Additional context
No response
The Secret is not used directly, so I think there is no expectation that it would roll Cruise Control just because you changed the Secret name or key. I would expect it to roll only if you actually change the content - e.g. add new user.
I tried it with different use-cases:
- edited the original secret content -> operator was able to apply the change
- created a new secret with modified content, furthermore renamed the secretKeyRef to this new secret -> operator was NOT able to apply the change it was still looking for the old secret
created a new secret with modified content, furthermore renamed the secretKeyRef to this new secret -> operator was NOT able to apply the change it was still looking for the old secret
That sounds strange since it has no way to know about the old Secret if you change it in the Kafka CR. I guess @kyguy can try to reproduce it and fix it. But the code looks good to me. So not sure what exactly would go wrong there.
Thanks in advance! @scholzj and @kyguy!
Triaged on 03.10.2024: @kyguy could take care of this and try to double check that the code looks good and it's not a bug, please?
Triaged on 17/10/2024: @kyguy any (good) news about this?
The Secret is not used directly, so I think there is no expectation that it would roll Cruise Control just because you changed the Secret name or key. I would expect it to roll only if you actually change the content - e.g. add new user.
+1 if the contents of the new secret are not different from the previous secret, the Cruise Control pod does not need to roll. A hash of the secret's contents is maintained in an annotation of the Cruise Control deployment, this hash is used to determine whether or not the Cruise Control pod needs to be rolled. Therefore, if the contents of the new secret are identical to that of the previous secret, the hash will not change and the Cruise Control pod will not roll. It doesn't matter if the name and key of the secret in the secretKeyRef section of the Kafka resource changes.
It is worth noting that even though the Cruise Control pod did not roll, the secret referenced in the secretKeyRef section is still being used. To confirm this, try updating the secretKeyRef with a secret that does not exist and check out the errors in the log of the Cluster Operator pod
created a new secret with modified content, furthermore renamed the secretKeyRef to this new secret -> operator was NOT able to apply the change it was still looking for the old secret
I was not able to reproduce this.
After creating a new secret with modified content and referencing that new secret in the Kafka resource, the Cruise Control pod was rolled to pick up the new content.
@egyedt Are you sure the new secret referenced was using new content? If so, do you have specific steps and configurations you could share to trigger the issue?
@kyguy In my example, I experienced the issue in the following case:
- Create a cluster-operator
- Create Cruise Control auth file with the following content (name: 'cruise-control-auth.txt'): userOne: passwordOne, USER
- Create secret for Cruise Control auth based on the previously created file
kubectl create secret generic <secret name> --from-file=cruise-control-auth.txt=cruise-control-auth.txt -n <namespace> - Deploy Kafka with CC: https://github.com/strimzi/strimzi-kafka-operator/blob/main/examples/cruise-control/kafka-cruise-control-with-goals.yaml, but with apiUsers section and with 'hashLoginService'
- Wait for the cluster to be ready
- Edit Cruise Control auth file with the following content (name: 'cruise-control-auth.txt'): userOne: passwordOne, VIEWER
- Create new secret for Cruise Control auth based on the previously modified file
- Edit kafka-cruise-control-with-goals.yaml to use the new secret, but nothing else
- Wait for reconciliation and CC rolling update (This will not happen!)
So the problem is that, the same user has different role level now, but it has no affect in Cruise Control due to the missing reconciliation.
Thank you for the reproducer @egyedt, using your directions I was able to reproduce the issue and find the problem. There is a bug in the code that prevents API users from old user-managed secrets from being deleted. The bug makes it impossible to update the passwords and roles of usernames that had been used by past user-managed secrets without manually deleting the Cruise Control API secret.
I'll submit a patch for this issue shortly!
@kyguy thank you!