k0s
k0s copied to clipboard
Should k0s enable hairpin-mode by default for kube-router?
Is your feature request related to a problem? Please describe.
When using kube-router, it is not possible for a pod that is part of a service to reach out to its service IP. This setup is called hairpinning and is supported in kube-router via some config flag which defaults to off. The docs say
Communication from a Pod that is behind a Service to its own ClusterIP:Port is not supported by default.
I suspect this is quite surprising for users, as they will probably assume that hairpinning just works.
Describe the solution you would like
There's #1937 which is about to add support for this config option to k0s. Would it be a better UX if k0s would enable hairpinning by default?
It's not entirely clear to me what this entails. There's probably a reason why kube-router has chosen to disable it by default. The doc string on the --hairpin-mode CLI flag says
Add iptables rules for every Service Endpoint to support hairpin traffic.
So this sounds costly, as managing iptables rules will slow doẃn significantly as soon as the number of rules reaches a certain size. This might start to matter on larger setups, with a lot of Services and NetworkPolicies.
Describe alternatives you've considered
Don't change the default. Folks can simply enable it if they need it, either on a service-level via annotations (i.e. kubectl annotate service my-service "kube-router.io/service.hairpin="), or cluster-wide via the new knobs introduced in #1937.
In my experiments, as soon as I add a second Pod to a Service, I am able to connect to the ClusterIP, although connection times vary and feel sluggish (think: seconds instead of milliseconds), so the actual impact of hairpinning being disabled is apparently even weirder.
In any case, this needs some investigation and IMO it's in order to write some docs about this, so that k0s users are aware.
Additional context
No response
Actually, there a two knobs involved:
- CNI-Config (
hairpinMode: true) - kube-router parameter
(--hairpin-mode=true)
Enabling hairpinning via annotation (kube-router.io/service.hairpin=) per service will only work if the CNI-Config allows hairpinning. I tried to keep it simple and didn't differentiate - maybe we should allow three states:
- disable hairpinning globally
- allow hairpinning, but don't enable it globally - let the user do this via annotation per service
- enable hairpinning globally
The issue is marked as stale since no activity has been recorded in 30 days
The issue is marked as stale since no activity has been recorded in 30 days
Just encountered this and was surprised. In my humble opinion, please enable hairpin by default in kube-router with a disclaimer? For small setups, you quite often might have only 1 replica of something (linstor-cs controller in my case) which tries to talk to itself via Service Cluster IP - and that fails without hairpin mode. A solution is to spin up more replicas, but again - for a small, homelab setup, its overkill and resource consumption by iptables is not such a big deal, so maybe for default setups hairpin should be on by default, and if people require tuning - they can disable it or switch to other CNI plugin?
The issue is marked as stale since no activity has been recorded in 30 days
I would enable it by default myself. Now the question is that given the k0s' choice of not being very opinionated if it makes sense to enable it by default when kube-router disables it.
Impact on enabling it by default if someone needs to disable it later.
The impact on small clusters is negligible and having it disabled by default isn’t expected at all. In fact I was surprised to find out this as well. The impact on large clusters is unknown, but the consequences are a big deal. If someone needs to disable it, it’s more serious than having it disabled by default and needing to enable it. Not having this working by default is annoying, but if your applications rely on it, the administrator disables it and all of a sudden random applications are broken the impact is just massive.
If a big cluster has it, it will be very hard for administrators to know what applications are using hairpining. No one will have a list of services that need it and they will need to trace their applications to find out. This process is not fast or easy. Another option that they have is to remove it and wait for random stuff to break but that’s not really suitable for a real production of a real business.
So at this point I think it’s reasonable to say: 1- Removing it from someone who has it configured is a bigger deal than removing it from someone who doesn’t 2- Only large clusters with a lot of endpoint changes will need this removed.
Performance impact
I doubt that this has a big performance impact, I couldn’t find many references in their issues to the tables lock being held, I also see that kube-router uses iptables in a way which is pretty performant and smart in networkpolicy but hairpining is done in a very naive way. This makes me think that they never had issues with this section of the code. I also see the hairpining code is very old, this makes me think performance is not very important here because it doesn’t have a big impact to begin with.
What others are doing
- Kube-router: By default has it disabled
- Kubespray: Hairpining is disabled by default I couldn’t find more distributions that ship kube-router.
Also I asked in the kube-router slack channel and I if I don't get a reply today I'll open an issue asking for opinions of the developers.
I was told in the kube-router slack by one of the main contributors:
We run large-ish clusters and we use global hairpin FWIW. It does increase the number of rules that get written into the NAT table, but this shouldn’t be overburdensome.
Also note that these global rules will only apply to ClusterIP and not ExternalIP or LoadBalancerIP
I think it's fairly safe to add this by default, now it's a philosophical on whether we want to do it or not, but I say enabling it is a safer default than not enabling it.
@twz123 @jnummelin any comments on this?
I'm fine with enabling this. Concerning the migration path from enabling to disabling it: I think that it's not that of a big deal, as, IIUK, the annotations can also be used to opt-out of hairpinning, so cluster admins can scan for Service objects without that annotation and ask the owners of those to evaluate if they need hairpinning or not, and add the annotation. So they can gradually enforce a deliberate decision on a per-service basis.
In order for k0s to support this, k0s probably needs to properly distinguish the two knobs mentioned in https://github.com/k0sproject/k0s/issues/1953#issuecomment-1199190251.
In order for k0s to support this, k0s probably needs to properly distinguish the two knobs mentioned in #1953 (comment).
I´m 99% certain that this is the case. If I understand corretly the CNI option sets the bridge mode to hairpin 1 2 3. So we need to set this option to allow hairpin in any of its ways (either allowing it explicly with the annotation or to enable it by default. I guess a good option name would be AllowHairpin.
The flag creates the iptables rules to make hairpin enabled to every service by default. I would call it GlobalHairpin. Of course GlobalHairpin would require AllowHairpin
In my opinion it would be clearer to make it one setting with three values:
hairpinning:
- enabled, which globally enables and activates hairpinning (kube-router and cni-config)
- allowed, which just enables hairpinning in the cni-config and leaves it to the user to annotate services which require hairpinning
- disabled
Yes, you are right, it makes more sense that way.
Having a tri-state setting sounds good. We still need to support the old settings as a fallback for backwards compatibility, though.