piraeus-operator icon indicating copy to clipboard operation
piraeus-operator copied to clipboard

Controller not able to run after node failure

Open pavanfhw opened this issue 3 years ago • 3 comments

Testing node failure on a 2 node cluster, the piraeus controller was not able to go back up again after being rescheduled to the remaining healthy node. The error was about the etc database:

20:05:29.276 [Main] INFO  LINSTOR/Controller - SYSTEM - Initializing the etcd database
20:06:29.725 [Main] ERROR LINSTOR/Controller - SYSTEM - Database initialization error [Report number 60C27085-00000-000000]

Doing the same test on a 3 node cluster, the recovery was ok. Does etcd need to run on minimum 3 replicas? I am thinking 1 replica is not enought becuase If I run etcd with 1 replica and host path volumes, it will never go up again if the node with the etcd pods fails. Is this correct?

In the 3 node scenario is it guaranteed that the etcd will always come back OK independently of which node fails? I am finding this etcd using volumes from another storage provider a complicated thing to deal with. Can anyone provide example of production usage of the linstor-operator regarding the etcd statefulset?

pavanfhw avatar Jun 10 '21 17:06 pavanfhw

Hi! You are correct with your assumptions, etcd will only allow (write) access to the database if a majority of nodes are available. In the 2 node scenario exactly half of the nodes are available, which is not enough for etcd. This is done to prevent a situation were a network failure would mean that both etcd instance could start writing because they think they are the majority.

In this scenario, a 3 node cluster will continue to work as long as only one node fails, regardless of which node.

We are also dissatisfied with the state of affairs. I can't promise anything but we are investigating if using the Kubernetes API as a datastore would be feasible. In that case, you would no longer need etcd and extra storage volumes.

WanzenBug avatar Jun 21 '21 08:06 WanzenBug

Hello and thank you for the clarification.

Using k8s API would be a very good solution. Hope it works!

pavanfhw avatar Jun 21 '21 12:06 pavanfhw

FYI, calico seems to have taken a similar approach - https://docs.projectcalico.org/getting-started/kubernetes/hardway/the-calico-datastore

sribee avatar Jun 29 '21 06:06 sribee

Using the k8s backend has been possible since v1.7.0, released back in december 2021.

WanzenBug avatar Feb 28 '24 07:02 WanzenBug