cockroach-operator icon indicating copy to clipboard operation
cockroach-operator copied to clipboard

failed to cause an issue by attempting to upgrade from version v.19.2.12 to v20.2.3 - unclear how to upgrade CRDB version

Open theodore-hyman opened this issue 4 years ago • 5 comments

EXPECTED RESULT: when the cockroachDB version is specified to be an updated version from the current version of the DB, in the Openshift operator, it should upgrade or fail. There should be some kind of notification somewhere? Or some kind of indication of something happening.

ACTUAL RESULT: nothing happened and I don't know why. I changed the operator config and reloaded my containers and it seems to have had no impact.

In my write up this implies that there is something wrong with CRDB or with Openshift however it is completely possible that I had no idea what I was doing because there was no documentation on how to update the version of CRDB in this environment, updating the Operator config and reloading may have been the wrong step? Perhaps this needs someone with some expertise to write up some documentation on the specific steps to update the version..

detailed replication steps: [referring to this documentation https://www.cockroachlabs.com/docs/v21.1/deploy-cockroachdb-with-kubernetes-openshift.html]

  • specify in the operator crdb version v19.2.12
  • then change operator yaml to specify "v20.2.3" and then save and reload [expected to fail]
  • nothing actually happened after save and reload, so I manually went into pods and deleted the operator pod and it created a new operator pod
  • the existing CRDB pods were still using the previous version and did not self-delete, so I deleted one of the CRDB pods, and it created a new one. So now I have 2 pods using v19.2.12 and one pod running v20.2.3

➜ ~ oc get pods NAME READY STATUS RESTARTS AGE cockroach-operator-68977698f7-fcfxh 1/1 Running 0 119s crdb-client-secure 1/1 Running 0 26m crdb-tls-example-0 1/1 Running 0 30m crdb-tls-example-1 1/1 Running 0 29m crdb-tls-example-2 1/1 Running 0 57s crdb-tls-example-vcheck-27021277-v78jq 0/1 Completed 0 30m

crdb-tls-example-2 is the one that should be running v20.2.3.. but I don't think it is.

  • now I update my client secure pod (from step 4) - however that didn't work, so I deleted it and created a new one using my updated container image data from crdb-tls-example-2
  • when I go into the pod and try the "cockroach version" command it still shows v19.2.12
  • I tried deleting all of the pods and re-creating them, the pods still do not show v20.2.3 even though the operator is configured with "cockroachDBVersion": "v20.2.3",

Hope this is helpful

theodore-hyman avatar May 17 '21 19:05 theodore-hyman

I agree that the docs could be better here - but "nothing" happened is what should have happened here - you cannot upgrade directly from 19.2 to 20.2 without upgrading through 20.1. This is a database limitation and the operator simply enforces it.

@taroface are we explicitly covering upgrades in your docs enhancements for the kubernetes operator? If not can we add a section on this topic?

keith-mcclellan avatar May 18 '21 16:05 keith-mcclellan

@theodore-hyman did you see the pod that started with the name vcheck in it? thats where you'd have found your error

keith-mcclellan avatar May 18 '21 16:05 keith-mcclellan

@taroface are we explicitly covering upgrades in your docs enhancements for the kubernetes operator? If not can we add a section on this topic?

Yes, upgrades are part of these docs, though the instructions are basically intact from the existing upgrade steps here: https://www.cockroachlabs.com/docs/v21.1/orchestrate-cockroachdb-with-kubernetes.html#upgrade-the-cluster

These steps call out that

To upgrade to a new version, you must first be on a production release of the previous version. The release does not need to be the latest production release of the previous version, but it must be a production release rather than a testing release (alpha/beta).

I can try to add emphasis here.

EDIT: I just saw that these are steps @jseldess added as part of the v21.1 docs update! They weren't there previously, my apologies. (They were in the regular CRDB upgrade docs and not the K8s version.) These are very clarifying and I'll add them to the WIP K8s docs update.

taroface avatar May 18 '21 16:05 taroface

@keith-mcclellan yes I agree this was overall expected behavior. I submitted this issue as part of the CRL Openshift bug bash and I was testing a "negative testing scenario" and in this scenario it was expected to not work. I did not check the 'vcheck' logs. If thats the expected place to see this type of error, good to know - maybe something to document? Not sure. The cluster has since been decommissioned as they are expensive so I don't have any method to further test this stuff on openshift.

@taroface The steps I was following are these:

https://www.cockroachlabs.com/docs/v21.1/deploy-cockroachdb-with-kubernetes-openshift.html

I understand that the upgrade steps are in the link below [which is new content!], but my thinking is that in your comment it is implicit that any given customer has to read through not only the Openshift docs, but has to read through the Kubernetes docs in order to properly manage their cluster. However, this is not obvious to me? If I was a customer I may be confused thinking that the kubernetes docs may not apply to my openshift deployment? Not sure. Might be something to consider making more clear in the docs.

For example, at the bottom of the Openshift doc linked above, it says "Note: For more information on managing secrets, see the Kubernetes documentation." - maybe this needs to have more than just secrets, maybe it should include upgrades, as well as other topics? Or maybe just say... "for more information on managing your cluster see Kubernetes docs..

https://www.cockroachlabs.com/docs/v21.1/orchestrate-cockroachdb-with-kubernetes.html#upgrade-the-cluster

Basically what I'm trying to say here is that this issue was opened because I had some feedback from the bug bash that I participated in, but I am not an actual customer, and my comments and feedback may not be representative of a real customer using this documentation, so feel free to take with grains of salt.

theodore-hyman avatar May 18 '21 16:05 theodore-hyman

@theodore-hyman This totally makes sense, thank you for calling it out. I forgot that you were following the OpenShfit docs in this case. Will be making a few updates to (hopefully) clarify :)

taroface avatar May 18 '21 16:05 taroface