control icon indicating copy to clipboard operation
control copied to clipboard

2.1: User should be able to provision etcd cluster separate from masters in HA configuration.

Open maikelvl opened this issue 7 years ago • 9 comments

Hi there,

This is a great project! You rock!

I've got a suggesting about the etcd part and willing to help out here: I see the master node is also running a single etcd node. However, since etcd has to be always running for the cluster to be up, this master node can never be down for maintenance. In production environments this is not recommended: "It is highly recommended that etcd is run as a dedicated cluster separately from Kubernetes components." - https://coreos.com/kubernetes/docs/latest/getting-started.html

As a quick starter we could make the etcd endpoints a setting when creating a cluster (see below), defaulting to the current situation (etcd on the master node).

{
  "cloud_account_name": "",
  "digitalocean_config": {
    "region": "nyc1",
    "ssh_key_fingerprint": ""
  },
  "master_node_size": "1gb",
  "name": "",
  "etcd_endpoints": "http://1.2.3.4:2379,http://5.6.7.8:2379,http://9.10.11.12:2379",
  "node_sizes": [
    "1gb",
    "2gb",
    "4gb",
    "8gb",
    "16gb",
    "32gb",
    "48gb",
    "64gb"
  ]
}

In an ideal situation the etcd clusters are manageable from the interface.

Before I start submitting PR's, do you have any thoughts about this?

maikelvl avatar Jan 14 '17 14:01 maikelvl

@maikelvl So by default you are correct. A master will be a one box sort of thing. A few months ago however I added a multi-master builder. To activate it, you specify a int value for "kube_master_count" in the kube config. https://github.com/supergiant/supergiant/blob/master/pkg/model/kube.go#L65

Multi-AZ is also specified there too... essentially the current configuration treats the master(s) as an array. By default there is only 1 member of the array. The int var tells the provider how many master need to be in the array. As members are added, they also become nodes in ETCD. Let me know if this makes sense or if you want me to throw you any more code links. This seemed the simplest way to go about it at the time, but I am always up for making things better. (We need doc on it as well :-( )

gopherstein avatar Jan 18 '17 18:01 gopherstein

@maikelvl Can you give us a more detailed picture of managing the ETCD cluster from the UI? This is interesting and I am curious about you vision of functions.

gopherstein avatar Jan 20 '17 16:01 gopherstein

@gopherstein Thank you for explaining the multi-master setup. It makes sense you went with the simplest way. As mentioned, for robustness and maintainability of the cluster it's recommended to keep etcd on separate machines. So I made a clickable html prototype how this may be manageable from the UI: https://github.com/maikelvl/supergiant/tree/feature/etcd-cluster-management/ui/assets/html There is a new item in the navigation: 'Etcd clusters' (linked to some static html pages)

I'm curious for your points of view and I'm open for discussion of course 😄

supergiant-etcd-cluster

(I'm not so sure about the icons, thought it makes things more clear)

maikelvl avatar Jan 22 '17 23:01 maikelvl

@gopherstein So what's your take on this? 😄

maikelvl avatar Jan 26 '17 18:01 maikelvl

So any challenges I throw up would be in the interest of making the "easiest" flow for users. My question here would be... Do you think users would want to manage etcd separate from "multi-master" kubernetes? What I mean here is, would it be better to allow a user to simply add or remove masters? (In reality they would also be adding/removing etcd nodes) My original thought was, to make things easier we would just allow the user to adjust the number of master and then the etcd actions would happen under the covers. However I may be totally wrong. Maybe we should have both? And.. How "easy" do we expect to make things for the user? @FestivalBobcats I would love to know your thoughts on this also.. The most common complaint I have run into is that kubernetes is just too confusing to setup.. My hope is that Supergiant can help make kubernetes a click button process. I would like to reduce a 1-3 month on boarding process for Kubernetes users, down to a few hours.

gopherstein avatar Jan 26 '17 18:01 gopherstein

I REALLY like the idea and demo you have. Currently each etcd instance we provision is also a kubernetes master. This setup works well.. Are you also thinking we should separate the etcd instances from the master?

gopherstein avatar Jan 26 '17 18:01 gopherstein

Thanks for your quick response! I will explain later (have to go now) ;)

maikelvl avatar Jan 26 '17 18:01 maikelvl

First and foremost, I share and admire your point about creating the easiest flow for users. 👍

To activate it, you specify a int value for "kube_master_count" in the kube config.

Launching a cluster with the kube_master_count option caused an error issued in https://github.com/supergiant/supergiant/issues/193

Are you also thinking we should separate the etcd instances from the master?

Yes. This makes the cluster more robust:

https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/03-etcd.md#why In production environments etcd should be run on a dedicated set of machines for the following reasons:

  • The etcd lifecycle is not tied to Kubernetes. We should be able to upgrade etcd independently of Kubernetes.
  • Scaling out etcd is different than scaling out the Kubernetes Control Plane.
  • Prevent other applications from taking up resources (CPU, Memory, I/O) required by etcd.

Apart from the separation question, to ensure a HA cluster you need to have a minimum of 3 etcd members That would mean there are two minimal main types of setups:

Multi-master:

  • 3x large kubernetes master instances all with an etcd member

Separate dedicated etcd cluster:

  • 3x small etcd instances
  • 1x large kubernetes master

I suspect the second option could be more cost effective.

Do you think users would want to manage etcd separate from "multi-master" kubernetes?

With good defaults this should not introduce great complexity. This doesn't have to be very prominent in the interface, but to have the option to scale etcd would make it lean for the user. Ideally, when the cluster grows, etcd has to grow with it. But we can handle that automation later.

What I mean here is, would it be better to allow a user to simply add or remove masters?

I think this would be a good thing to be able to scale the master. Whether there is a seperate etcd cluster or not.

Maybe we should have both?

Yes, that could work. From which the following could be the default:

...
    "kube_master_count": 3,
    "etcd_node_count": 0
...
...
    "kube_master_count": 1,
    "etcd_node_count": 3,
    "etcd_node_size": "1gb"
...

Because of the complexity I think we should protect users from creating a cluster with one etcd member and taking the risk their cluster goes down when that one master node takes a break. We could allow it, but show a warning about that fact.

maikelvl avatar Jan 29 '17 16:01 maikelvl

The user should be able to provision etcd in a separate cluster from the masters.

gopherstein avatar Nov 20 '18 18:11 gopherstein