cassandra-operator
cassandra-operator copied to clipboard
Updates to cassandraImage are ignored
When updating the cassandraImage
in a CDC spec it doesn't update the corresponding StatefulSets.
The operator correctly gets an event and starts reconciling the CDC but somewhere in the code it then doesn't manage to update the StatefulSet spec. There's no error in the logs.
Also tested with another field, such as flipping optimizeKernelParams
which should also change the StatefulSet spec.
I think there's currently a bug with detecting whether a StatefulSet has changed and needs to be updated.
Tested with:
- operator v5.0.0
- sidecar v3.1.1 and v5.0.0
- topology-aware setup
- Kubernetes v1.16.7
Interesting, I ll debug this one. Thanks for reaching out.
@linki , actually, what would you like to see as a result here? StatefulSet is changed and now what ... If you want to effectively run on a different image, it means that the container itself would have to be restarted.
Could you check that the image is changed in Statefulset after your changes? In other terms, do you see that you have your new image in Statefulset but there is not any action taken?
As a workaround, you might restart that pod / container. Restarting a container should fetch latest stuff from Statefulset spec with your new image. Restarting just Cassandra container effectively means you kill pid 1. This might be done from Sidecar by calling restart endpoint. Please try to read this section in auth doc (1) which is related to restarting a pod (I might move this documentation somewhere else in the future).
(1) https://github.com/instaclustr/cassandra-operator/blob/master/doc/auth.md#switching-between-allowallauthenticator-to-passwordauthenticator
Could you check that the image is changed in Statefulset after your changes? In other terms, do you see that you have your new image in Statefulset but there is not any action taken?
It doesn't even update the StatefulSet itself so Kubernetes isn't doing anything. When the CDC changes the operator should propagate those down to the StatefulSets.
After thinking about it more it's probably not a good idea to blindly apply the new spec to the StatefulSet since some updates might break the clusters.
It's just that currently there's no way of updating the Cassandra version or other fields without directly editing the operator-managed StatefulSets.
Yeah, maybe I can just do something about that one.
The only time it "reacts" is if number of replicas do not match, based on what you entered into new spec (bigger or lower number), it will scale up or down, so there is already this "functionality" we just have to expand it to image name and trigger restart somehow.
We just started using cassandra with the help of the instaclustr operator, so I am a quite a newb to both, but I would like to add my thoughts here, based on what little info I found about that topic so far. Sorry if some things might seem a bit naive. ;-)
I am currently looking into ways to do an update a cassandra cluster, too. Ideally while keeping the cluster operational during the update process, albeit with reduced capacity and redundancy. According to some documentation, for a non-containerized Cassandra, it is possible to do a rolling update, as described, e.g., here: https://stackoverflow.com/questions/44024170/upgrading-cassandra-without-losing-the-current-data Alas, it seems not to be as simple as just patching the version in the statefulset and let the statefulset do a rolling update - in the description above, example, after a node is shut down and the binaries updated, in addition nodetool upgradesstables has to be run on that node. I am not sure if that is always required when doing any update (major, minor or patch version change) - maybe there are version jumps (patches maybe?) where it is safe to just replace the binary with new one and start the node again? Another option would be to have the startscript check if a version update has happened and run upgradesstables on demand? The pod would not become ready until this completes, and only then would the statefulset proceed with stopping and updating the next pod.
Naive, maybe, but that is what I would hope an operator could do for me. :-)
Hello @jgoeres ,
yeah this one is very complex. I would personally investigate update of a node by decomissioning of a node so cluster would shink down and I would try to scale it up, presumably with new image. This can be maybe done by restarting a node with new stuff ... However this approach is rather ... strange.
Anyway, the whole approach should be ridden by operator, no custom scripts, that gets complicated very quickly.
There is a sidecar running along each node and the best thing would be if the operator sent "upgrade request" to Sidecar (via its REST API) and this Sidecar would restart each node one by one. Restarting of a node is already done. Rolling restart of a cluster and its orchestration is not hard here, what needs to be done is that there needs to be Cassandra image in spec changed so it is restarted with new values ... If you can do this manually, triggering rolling restarts with orchestrated sstable upgrades should be just easy.