kubernetes-mesos icon indicating copy to clipboard operation
kubernetes-mesos copied to clipboard

HA for k8sm-controller-manager

Open ravilr opened this issue 9 years ago • 12 comments

@jdef Is there a recommendation for running redundant k8sm controller manager's in HA setup similar to scheduler: https://github.com/kubernetes/kubernetes/blob/master/contrib/mesos/docs/ha.md

ravilr avatar Aug 26 '15 08:08 ravilr

not yet - that type of work is currently on hold.

going forward we're going to be refactoring the scheduler HA to communicate w/ the apiserver instead of w/ etcd directly. see https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/high-availability.md

jdef avatar Aug 26 '15 14:08 jdef

leader election integration with controller-manager component has landed in upstream: https://github.com/kubernetes/kubernetes/pull/19621 Would love to see k8sm-scheduler also updated to use the same leaderelection client recipe.

ravilr avatar Feb 03 '16 23:02 ravilr

see use case: https://github.com/mesosphere/kubernetes-mesos/issues/493#issuecomment-179524159

jdef avatar Feb 05 '16 19:02 jdef

Is there any update on this issue?

salmanbukhari avatar Jun 14 '16 02:06 salmanbukhari

not from the mesosphere team. perhaps someone in the community has started hacking on this?

jdef avatar Jun 14 '16 03:06 jdef

Oh okay I was looking at the document regarding Mesos HA cold stand by mode. There was a strong dependency on the Nginx in scheduler according to the document. What happens if NGINX goes down? Can you answer or refer me to right person who can answer?

salmanbukhari avatar Jun 17 '16 01:06 salmanbukhari

my read on this is that the nginx instructions are enough to demonstrate a PoC for cold-standby mode. You could probably replace nginx with your choice of an HA load balancer (though, as the docs say there are some additional protocol requirements of some kubectl commands).

/cc @huang195

jdef avatar Jun 17 '16 02:06 jdef

Actually, I was concerned because of these statements in the doc : "It is critically important to point --advertised-address to Nginx so all the schedulers would be assigned the same executor ID... they would generate different executor IDs(in the case of different Ip)" . So the scheduler needs an IP address of any load balancer that stays up and persists IP even in the case of failure.

salmanbukhari avatar Jun 17 '16 02:06 salmanbukhari

Right, so basically you don't want any parameters/environment variables sent from the k8sm scheduler to the k8sm executor process to change. Network addresses are part of that. If using a resolvable DNS name solves the problem for you (because it will always resolve to some LB to reach the API server) then that should work just fine. But now you've added a DNS dependency. If you're fine with that, great. Does this make sense?

jdef avatar Jun 17 '16 02:06 jdef

Yes, thank you for the explanation. But it will make the system more complex for me , as in my case I am doing automation of kubernetes on mesos with high availability on AWS and NGINX will be running on same nodes as masters. I will try with elastic ip but if it didn't worked then I have to do it with DNS. Is this same for kubernetes HA without mesos? As there is no such point regarding advertised-address mentioned on their website.

salmanbukhari avatar Jun 17 '16 02:06 salmanbukhari

I'd have to review the k8s HA docs. in stock k8s, kubelet and kube-proxy (which run on all the slave/agent/node/whatever hosts) need to be able to find the API server somehow. whether that's via IP or DNS name. k8s bootstrapping has been an ongoing issue for a bit now. i'm not sure how far they've gotten for HA setups. you could review the salt scripts to see how they do it for dev setups on GCE. but that sounds like a much different case than you're trying to solve.

k8sm certainly has some different configuration requirements (and edge cases) due to (a) the nature of running on a mesos cluster, and (b) how we approached configuration for the components. there's been some exciting dc/os developments lately that could probably help with some of the sharp edges related to service discovery and running k8sm components on dc/os. if you're interested in that kind of thing there's a group of people congregating here: https://dcos-community.slack.com/messages/kubernetes/

jdef avatar Jun 17 '16 03:06 jdef

@salmanbukhari you will need multiple load balancers (e.g., nginx, haproxy, etc.) in front of apiservers, so that you don't have a single point of failure. You can then use a single floating IP managed by something like Keepalived to manage the floating IP when the currently active load balancer fails.

huang195 avatar Jun 17 '16 11:06 huang195