kubernaut
kubernaut copied to clipboard
Support LoadBalancer Services
Kubernaut currently does not support LoadBalancer services but many people use type: LoadBalancer
in their service manifests. The current work around is to use type: NodePort
but this means changing manifests specifically for Kubernaut which is undesirable.
Preliminary Research
I started examining this problem shortly after releasing Kubernaut. Unfortunately, I have discovered there is no simple ideal solution to this problem.
The Primary Problem
Kubernaut does not support type: LoadBalancer
services because it presents a serious Quality-of-Service issue for multiple users. Here are the facts:
- Each
LoadBalancer
service on AWS creates an Amazon ELB. - We cannot restrict the number of
LoadBalancer
services that can be created and therefore we cannot limit the number of Amazon ELB instances that are created. - The maximum number of Amazon ELB instances you can create is determined by your AWS account quota.
- The AWS account quota can be increased, but, it is a manual (human) operation that does not have an API and needs to be justified to AWS.
- When the quota is exhausted, from the perspective of the user who interacts with Kubernetes via
kubectl
they see theirLoadBalancer
service stuck in apending
state without explanation. To these users the system appears broken. - It is likely because humans do dumb or malicious things that a single user or small group of users exhausts the LoadBalancer pool for the other users causing massive QoS degradation.
Problem 2
This is a small problem and is internal to our operation of the service.
Amazon ELB instances are NOT free. They are charged at $0.025/hr (~$18.30/mo) just to run. Data is charged $0.008/GB IN/OUT data. At a minimum, assuming 1 user is consistently using the service for a whole month we have a spend of $18/user just for the ELB. Data is cheap, and they would need to pump a lot of data through the system to cost us a lot of money, but lets once again assume humans are dumb or malicious and someone decides to upload a 100MB archive in a loop through their service. We need to have monitoring in place that allows us to basically cut off the user. That is engineering effort to us to prevent someone from causing financial harm.
Problem 3
I examined whether it was possible to disable the ELB provisioning functionality in the AWS cloud provider integration for Kubernetes and the answer is that it is not possible. My idea was to disable it and replace it with a different service controller that would talk to say a HAProxy or Nginx cluster we ran that could act as a multi-tenant TCP load balancer.
A brief technical explanation:
-
When
kube-controller-manager
andkubelet
come up you specifiy a cloud provider via a CLI switch. In our case we specifyaws
which enables the AWS integration for Kubernetes and makes it possible to run on that cloud and use that clouds API to bootstrap the cluster machinery (e.g. inspects the EC2 metadata service). It also enables stuff like using EBS volumes for storage. -
There is no exposed way to override the
services-controller
that I could find. Theservices-controller
is responsible for actually orchestrating the creation of the LoadBalancer (create load balancer, create firewall rules, add/remove nodes to the backend pool).
If we want this kind of customizability it seems we need to get involved in Kubernetes development process:
-
We need to consult the Kubernetes team on the best path forward.
- Should the ability to override the services-controller in a cloudprovider be customizable?
- If yes, is it something that is done generically outside of the AWS integration (likely) or only done for the AWS integration (maybe).
-
We need to modify the Kubernetes code and get it shipped in a release OR alternatively find someone to do it (e.g. ask for an enhancement in an Issue).
-
Deployment tools, specifically
kubeadm
in our case, need to be updated to expose the new configuration mechanism for specifying the new services-controller.
Doing additional implementation research. To avoid hard AWS limits we will need to configure the following:
DisableSecurityGroupIngress
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L441
ElbSecurityGroup
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L446
We will also need to adjust our usage of kubeadm
during bootstrap:
https://kubernetes.io/docs/admin/kubeadm/
Specifically see "Cloudprovider integrations (experimental)"
Suggested approach I learned of at Velocity NYC from Kelsey Hightower
- Use RBAC to limit to 1 if that's desired. this would make users not quite admins in kubernetes cluster but thats probably ok as it is a very small corner.
- write control loop that monitors kubernetes api and does the endpoint -> service mapping magic. Watch for "type: LoadBalancer" services.
- use PATCH mechanism to change the status of the LoadBalancer with the public IP that was created via external process (e.g. a multi-tenant nginx tcp load balancer)
Discovered this while poking around for AWS and Kubernetes stability issues... this would be an additional implementation problem: https://github.com/kubernetes/kubernetes/issues/29298