cockroach-operator
cockroach-operator copied to clipboard
unable to start new cluster with existing persistent volumes
After deleting a cluster by running kubectl delete -f example.yaml, and not deleting PVCs and PVs, I start a new cluster with kubectl apply -f example.yaml. The pod cockroachdb-0 remains unready with no other pods creating:
cockroachdb-0 0/1 Running 0 2m17s
The pod shows:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m23s default-scheduler Successfully assigned default/cockroachdb-0 to gke-cockroachdb-default-pool-01b609ce-vkqv
Normal SuccessfulAttachVolume 2m17s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-2c986ce6-96b8-42bf-80c9-10d4e39ace2c"
Normal Pulled 2m14s kubelet, gke-cockroachdb-default-pool-01b609ce-vkqv Container image "cockroachdb/cockroach:v20.2.8" already present on machine
Normal Created 2m14s kubelet, gke-cockroachdb-default-pool-01b609ce-vkqv Created container db
Normal Started 2m14s kubelet, gke-cockroachdb-default-pool-01b609ce-vkqv Started container db
Warning Unhealthy 15s (x22 over 2m) kubelet, gke-cockroachdb-default-pool-01b609ce-vkqv Readiness probe failed: HTTP probe failed with statuscode: 500
I'm only able to create the new cluster if I first delete the volumes with kubectl delete pv,pvc --all.
This is using the Operator version currently at https://github.com/cockroachdb/cockroach-operator/blob/master/manifests/operator.yaml on GKE and CRDB v20.2.8.
I thought this behavior was expected. I think this person on Community CockroachDB slack may be experiencing this issue as well (Slack conversation).
@keith-mcclellan , can you verify?
This is a known issue and is fixed in https://github.com/cockroachdb/cockroach-operator/pull/477 - its set as a release blocker so it should be fixed when that is merged.
The PR changed, but I am not certain my PR will fix this