etcd-operator
etcd-operator copied to clipboard
Etcd Members don't have a persistent storage identity
Steps to reproduce:
- Create an EtcdCluster that uses PersistentVolumes
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
name: "example-etcd-cluster"
spec:
size: 3
pod:
persistentVolumeClaimSpec:
storageClassName: default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
- Delete a member of the cluster:
kubectl delete <pod_name>
- Etcd operator creates a new Pod to replace the deleted one
Video Footage: https://www.youtube.com/watch?v=pYyqDs1piG8&feature=youtu.be
What I expect to happen: The new member should get the PersistentVolume of the deleted one.
What happens: The new member got a new, empty PersistentVolume.
Why this is a problem: Pod deletes happen in Kubernetes even without user error. For example, draining a node will cause Pods on the node to be deleted. New members of etcd should get the PV of an old member, if it exists. StatefulSet already handles this.
Same here, I'd like to be able to recover the cluster without restoring from a backup (sometimes not possible to restore valid data) even with manual intervention. I could achieve that with proper persistent storage.
Would it be ok to start a PR using Statefulsets or at least build reclaim logic from a PVC?
A PR that swaps the current custom controller with StatefulSets would probably require a major refactor of the codebase. However, StatefulSets have come a long way and I feel they are a good fit for running etcd.
This operator was developed during the very early Kubernetes days and had to implement a lot of things that are now included out-of-the-box with StatefulSets. Extending the current code to handle this scenario could be done, but it is something that StatefulSets already do successfully and is well-tested.
I believe it would be a better solution to start clean and leverage StatefulSets as well as a modern framework for writing operators, like kubebuilder
or operator-sdk
.
It would greatly simplify the current codebase and make it easier for new people to contribute.
I'd love to hear some opinions on this from contributors! :smile: @hexfusion @hongchaodeng @xiang90
I agree with @yanniszark. Also I see most of the organizations are anyways relying on StatefulSets features anyhow with custom bootstrapper and teardown scripts.
/cc @alaypatel07
I believe it would be a better solution to start clean and leverage StatefulSets as well as a modern framework for writing operators, like kubebuilder or operator-sdk. It would greatly simplify the current codebase and make it easier for new people to contribute.
I think that this is really the question, at which point do we move to SDK framework?
It's true that I'm mixing up 2 orthogonal things here:
- The issue is that deleting a Pod results in the loss of its PV, which should not happen.
- The proposed solution is to use StatefulSets, to solve the problem and simplify the code. This solution requires major refactoring.
- A new operator-framework would greatly simplify the code and offer new features (dynamic client, OpenAPI validation, admission webhooks). This requires major refactoring.
Since both require major refactoring and both simplify the code greatly, I'm kind of grouping them together, which may be kinda wrong.
@hexfusion I think now would be a good time move to SDK. Especially because, as @yanniszark pointed out, it would give us an opportunity to simplify the codebase.
@yanniszark @raoofm I have developed a proof of concept etcd operator using the operatorSDK's --type=Ansible
to check any issues around using StatefulSets for managing etcd. Please feel free to check it out, here, and share thoughts. Thanks
I'm not too sure about using Ansible. This would present an extra barrier for existing contributors since they would also have to learn how Ansible works. In addition, most Operators out there are written in Go, etcd is written in Go and because of that I suspect that the CoreOS team working on etcd is made of people working with Go primarily.
Because of the aforementioned reasons, I think an implementation in Go makes more sense for this project.
@yanniszark I am sorry if it sounded like I was suggesting to refactor this project in Ansible, it would be an insane thing to do. I prototyped the POC in Ansible, to figure out the corner cases of using the StatefulSet resource.
Refactoring this project with SDK and StatefulSet will be a lot of work. Just wanted to have a quick jab at the pitfalls of the approach for folks to try out before a similar implementation for this project can be planned.
@alaypatel07 Ahh makes much more sense now! That's great, please do share your findings! :smile: Did you encounter any major roadblocks?
-
For some reason, I cannot get
readinessProbes
to work. My understanding is that for a 3 pod cluster, the StatefulSet controller only rolls out pod-1 if pod-0 is ready and pod-0'sreadinessProbe
will not succeed until all the three pods are up and have formed a quorum, leading to a deadlock. I could be wrong, feel free to try it out. -
The startup script is a little unreadable IMO, but it can be easily taken out and implemented as an
initContainer
. The logic embedded(hopefully written in go) in theinitContainers
can be tested and hence, it is a predictable way of doing things. -
Initially, I used the
preStop
hook to remove the pod as an etcd member before deleting, but I noticed that onkubectl delete pods -l app=etcd
all the pods will be removed from the etcd cluster. Following that, even if the StatefulSet controller re-creates the pods, the cluster can be in an unpredictable state. Hence, I think removing the member from the operator is a better way than usingpreStop
hook. It will keep the cluster stable even on an accidental delete of pods as well as pods being rescheduled (because of node draining for instance).
Overall, I would say there isn't a real roadblock(up to now) that would hamper the use of StatefulSet. Having said that, the refactor is a huge task, we need to be careful and think this one through. I am still trying to figure out other major issues like deploying cluster using TLS certs, making Backup and Restore work, etc. Once I feel confident, I would go ahead and sum up the key learnings in a proposal.
related issue: https://github.com/coreos/etcd-operator/issues/1984
Make pod name stable like statefulset is the key point.