coreos-kubernetes icon indicating copy to clipboard operation
coreos-kubernetes copied to clipboard

Pods are not accessible via CusterIP from the nodes in vagrant multi-node

Open fxposter opened this issue 9 years ago • 8 comments

I have problems with (I guess) either iptables in kube-proxy or routing to flannel via service ClusterIPs. For example: I have two nodes in the cluster: A and B. Node A has pod a deployed. Node B has pod b deployed. a and b are the same node, but deployed with replica=2 into the cluster. There is a service S of type ClusterIP, which is supposed to load balance between a and b.

a and b is open on port 80. S proxies port 80 to port 80 of a and b.

I test "connectivity" by doing curl ClusterIP and curl PodIP.

These cases work fine: a -> S -> b b -> S -> a A -> a A -> b B -> a B -> b

But there is a problem: A -> S -> b B -> S -> a do not work at all.

What's even more interesting is that A -> S -> a B -> S -> b work fine.

IE: the case when S resolves to an IP of the pod on the different node does not work. Is it expected? Can this be fixed somehow.

I'm getting this on both kube 1.2.4 and 1.3.0.

fxposter avatar Jul 09 '16 17:07 fxposter

managed to get a tcpdump: https://gist.github.com/fxposter/f336c79bad578e8c6c471f45c70f45b5

the difference I see is default mss...

btw, I'm using the last alpha release of core os: 1097.0.0

fxposter avatar Jul 10 '16 08:07 fxposter

So pod -> service -> remote pod -- works (a -> S -> b) But host -> service -> remote pod -- does not (A -> S -> b)

That does seem odd considering you should be hitting the same iptables rules in both cases.

Before digging deeper - just to rule out some previously seen issues, can you try changing kube-proxy to use --proxy-mode=userspace and see if that helps your issue. Possibly related: https://github.com/kubernetes/kubernetes/issues/20391

aaronlevy avatar Jul 11 '16 22:07 aaronlevy

@aaronlevy that can be fixed by having these additional iptables rules on every node (though, I'm not sure why that is not a default):

iptables -t nat -I POSTROUTING -o flannel.1 -s $NODE_DEFAULT_IP -j MASQUERADE

fxposter avatar Jul 16 '16 07:07 fxposter

I am also currently experiencing something like this (not quite, but still close enough).

  • 2 Node cluster running the latest build
  • I have a pod that runs mysql
  • A service (type clusterIP) that exposes the Mysql port (3306)
    • Trying to connect to the db from the pod locally: Works (duh)
    • Trying to access the database from the mysql pod through the service ip: Timeout
    • Trying to access the database from the pod through the cluster ip:Works
    • Running a port-forward to port 3306 and accessing from local machine: Works
  • I start a new pod (bash, whatever)
    • Trying to access the database from this pod through the mysql pod ip: works
    • Trying to access the db from this pod through service ip: nope

Funny thing I noticed: For services that are of type LoadBalancer, access from the pods works

Unfortunately the iptables in above post did not do anything.

It is a bit weird that nobody has come across this, seems like a pretty heavy bug to me 😄

wirtsi avatar Aug 25 '16 15:08 wirtsi

Can you test out creating the service as type "NodePort"?

robszumski avatar Aug 25 '16 16:08 robszumski

Hi ... just tried with NodePort, unfortunately same thing. Only with type: LoadBalancer can I access the service internally

wirtsi avatar Aug 26 '16 07:08 wirtsi

That's odd because a loadBalancer is still assigning a node-port. The only extra thing it does is use a cloud-provider to launch a loadbalancer (if available). For example, if you're on AWS it would launch an ELB.

@wirtsi how was this cluster launched / are you running on a particular cloud provider?

aaronlevy avatar Aug 26 '16 16:08 aaronlevy

Sorry, my bad. Service was not configured correctly, all works good 😵

wirtsi avatar Sep 01 '16 16:09 wirtsi