spilo icon indicating copy to clipboard operation
spilo copied to clipboard

Helm chart - use external etcd

Open rimusz opened this issue 6 years ago • 39 comments

Any way I can specify e.g. etcd-operator to be used with patroni chart instead of built in etcd?

rimusz avatar Oct 04 '17 15:10 rimusz

tried to pin to etcd-operator created cluster svc Etcd.Host=etcd-cluster-client, that did not work still patroni etcd was created.

rimusz avatar Oct 04 '17 15:10 rimusz

as I tried resilience of of patroni etcd, which is not good, if etcd pod gets restarted/moved to another node it does not come up anymore:

kubectl logs patroni1-etcd-2
cat: can't open '/var/run/etcd/member_id': No such file or directory
Re-joining etcd member

rimusz avatar Oct 04 '17 15:10 rimusz

I thing you need to set value of Etcd.Host to the first Pod of etcd cluster created by etcd-operator

https://github.com/coreos/etcd-operator#create-and-destroy-an-etcd-cluster

$ kubectl get pods
NAME                            READY     STATUS    RESTARTS   AGE
example-etcd-cluster-0000       1/1       Running   0          1m
example-etcd-cluster-0001       1/1       Running   0          1m
example-etcd-cluster-0002       1/1       Running   0          1m

In this example it would be example-etcd-cluster-0000. Patroni will use it and discover all other nodes of etcd-cluster.

CyberDem0n avatar Oct 04 '17 16:10 CyberDem0n

but if that first etcd pod gets destroyed, then etcd-operator creates new pod with the new name. not really HA setup svc is better to be used there

rimusz avatar Oct 04 '17 16:10 rimusz

but if that first etcd pod gets destroyed, then etcd-operator creates new pod with the new name.

Will it? I've thought it will preserve original name and mimic so to say StatefulSet behaviour.

svc is better to be used there

It also could work. You can create kubernetes Service with labelSelector finding all Pods of etcd-cluster and specify such service in the Etcd.Host.

At the end Patroni will anyway use such Service only once, to get a topology of etcd-cluster and later it will connect to event node individually.

CyberDem0n avatar Oct 04 '17 17:10 CyberDem0n

no, it does not mimic StatefulSet behaviour I already tried to use to etcd-operator created cluster svc Etcd.Host=etcd-cluster-client, that did not work still patroni etcd was created.

rimusz avatar Oct 04 '17 17:10 rimusz

that's not good approach to be used with etcd-operator:

At the end Patroni will anyway use such Service only once, to get a topology of etcd-cluster and later it will connect to event node individually.

as etcd-operator always recreates a new pod with the new name

rimusz avatar Oct 04 '17 17:10 rimusz

as etcd-operator always recreates a new pod with the new name

Patroni is much smarter than you think. If the "Pod" it connected to has failed, it will switch to another "Pod" and rediscover topology of etcd cluster. If nothing is failing, it will refresh topology every 5 minutes. If all Pods failed at the same time, Patroni will go back to the original ETCD_HOST specified in the configuration. If it points to the Service - everything will be fine. Basically you can rotate all etcd Pods and Patroni will survive.

CyberDem0n avatar Oct 04 '17 18:10 CyberDem0n

ok, cool then but why it did not connect to SVC of the etcd-operator created cluster?

rimusz avatar Oct 04 '17 18:10 rimusz

Does etcd operator creates service?

CyberDem0n avatar Oct 04 '17 18:10 CyberDem0n

yup, as you can see below it is etcd-cluster-client.

$ k get service
NAME                   CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
etcd-cluster           None         <none>        2379/TCP,2380/TCP   2h
etcd-cluster-client    10.3.0.149   <none>        2379/TCP            2h
$ k describe svc etcd-cluster-client
Name:			etcd-cluster-client
Namespace:		spcqm-system
Labels:			app=etcd
			etcd_cluster=etcd-cluster
Annotations:		service.alpha.kubernetes.io/tolerate-unready-endpoints=true
Selector:		app=etcd,etcd_cluster=etcd-cluster
Type:			ClusterIP
IP:			10.3.0.149
Port:			client	2379/TCP
Endpoints:		10.2.2.11:2379,10.2.3.16:2379,10.2.4.13:2379
Session Affinity:	None
Events:			<none>
$ k get pods -l app=etcd -o wide
NAME                READY     STATUS    RESTARTS   AGE       IP          NODE
etcd-cluster-0000   1/1       Running   0          2h        10.2.4.13   xxx
etcd-cluster-0001   1/1       Running   0          2h        10.2.3.16   xxx
etcd-cluster-0002   1/1       Running   0          2h        10.2.2.11   xxx

rimusz avatar Oct 04 '17 18:10 rimusz

And what curl http://etcd-cluster-client:2379/v2/machines shows?

You need to execute it from one of the pods (for example one of patroni pods)

CyberDem0n avatar Oct 04 '17 18:10 CyberDem0n

root@patroni3-patroni-0:/home/postgres# curl http://etcd-cluster-client:2379/v2/machines

http://etcd-cluster-0000.etcd-cluster.spcqm-system.svc:2379, http://etcd-cluster-0001.etcd-cluster.spcqm-system.svc:2379, http://etcd-cluster-0002.etcd-cluster.spcqm-system.svc:2379

looks good there

rimusz avatar Oct 04 '17 18:10 rimusz

Looks good. Is http://etcd-cluster-0000.etcd-cluster.spcqm-system.svc:2379 accessible from patroni pod? And what echo $ETCD_HOST shows?

CyberDem0n avatar Oct 04 '17 18:10 CyberDem0n

root@patroni3-patroni-0:/home/postgres# env | grep ETCD_HOST
ETCD_HOST=etcd-cluster-client

rimusz avatar Oct 04 '17 19:10 rimusz

root@patroni3-patroni-0:/home/postgres# curl http://etcd-cluster-0000.etcd-cluster.spcqm-system.svc:2379
404 page not found

rimusz avatar Oct 04 '17 19:10 rimusz

etcd-operator is installed to the same namespace as patroni

rimusz avatar Oct 04 '17 19:10 rimusz

DNS check of the POD is fine:

 kubectl exec busybox -- nslookup etcd-cluster-0000.etcd-cluster.spcqm-system.svc
Server:    10.3.0.10
Address 1: 10.3.0.10 kube-dns.kube-system.svc.cluster.local

Name:      etcd-cluster-0000.etcd-cluster.spcqm-system.svc
Address 1: 10.2.4.13 etcd-cluster-0000.etcd-cluster.spcqm-system.svc.cluster.local

rimusz avatar Oct 04 '17 19:10 rimusz

Everything looks good. Patroni is configured to use etcd cluster deployed by etcd operator.

Now I am completely lost and don't understand what you problem is.

CyberDem0n avatar Oct 04 '17 19:10 CyberDem0n

it is more patroni related issue, etcd-operator is functioning fine, I do not have RBAC enabled there

rimusz avatar Oct 04 '17 19:10 rimusz

it is more patroni related issue

Not really Patroni issue, but patroni helm chart. I am not really familiar with helm chart internals, but it seem Patroni chart has etcd as dependency: https://github.com/kubernetes/charts/blob/master/incubator/patroni/requirements.yaml

CyberDem0n avatar Oct 04 '17 19:10 CyberDem0n

chart’s readme says that etcd_host is not used

rimusz avatar Oct 04 '17 19:10 rimusz

i will play remove that dependency tomorrow, but if that env var is not used by patroni patroni should fail

rimusz avatar Oct 04 '17 19:10 rimusz

chart’s readme says that etcd_host is not used

Looking on chart internals (https://github.com/kubernetes/charts/blob/master/incubator/patroni/templates/statefulset-patroni.yaml#L49) I can tell that it is definitely used and propagated to the StatfulSet and underlying Pods. Readme is just wrong, sorry about that, I am not maintainer of Patroni helm chart. You can create a pull request updating helm chart documentation.

P.S. I am working on Patroni kubernetes native deployment: https://github.com/zalando/patroni/pull/500 It makes it possible to deploy Patroni on kubernetes without etcd. If you have time please try it.

CyberDem0n avatar Oct 04 '17 19:10 CyberDem0n

sure will play with the chart tomorrow and also will check that stuff too

On Wed, 4 Oct 2017 at 20:31, Alexander Kukushkin [email protected] wrote:

chart’s readme says that etcd_host is not used

Looking on chart internals ( https://github.com/kubernetes/charts/blob/master/incubator/patroni/templates/statefulset-patroni.yaml#L49) I can tell that it is definitely used and propagated to the StatfulSet and underlying Pods. Readme is just wrong, sorry about that, I am not maintainer of Patroni helm chart. You can create a pull request updating helm chart documentation.

P.S. I am working on Patroni kubernetes native deployment: zalando/patroni#500 https://github.com/zalando/patroni/pull/500 It makes it possible to deploy Patroni on kubernetes without etcd. If you have time please try it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zalando/spilo/issues/195#issuecomment-334264737, or mute the thread https://github.com/notifications/unsubscribe-auth/AE-Uo5fPRCW7u3LO5Jc7W81fyqXTl02Xks5so90EgaJpZM4Pt0Ee .

rimusz avatar Oct 04 '17 19:10 rimusz

@CyberDem0n is the Patroni kubernetes native deployment and https://github.com/zalando-incubator/postgres-operator the same thing?

rimusz avatar Oct 05 '17 08:10 rimusz

No, postgres-operator is a tool similar to the etcd-operator.

CyberDem0n avatar Oct 05 '17 08:10 CyberDem0n

interesting, you guys have two new projects to run postgres in kube

rimusz avatar Oct 05 '17 08:10 rimusz

now I'm not sure which one to stick to

rimusz avatar Oct 05 '17 08:10 rimusz

Actually not two, but three.

Patroni - does all heavy lifting, like automatic failover and so one. Can work on bare metal and inside docker. Spilo - this is a docker package of Patroni+PostgreSQL+wal-e+some other useful stuff. postgres-operator - deploys Spilo on kubernetes using third party resources

CyberDem0n avatar Oct 05 '17 09:10 CyberDem0n