kube-ovn
kube-ovn copied to clipboard
[BUG] TCP DNS Traffic Blocked Despite Security Group Rule Allowing Egress to DNS Service
Kube-OVN Version
v1.12.8
Kubernetes Version
Server Version: v1.26.9
Operation-system/Kernel Version
"Ubuntu 22.04.2 LTS"
Description
We have encountered an issue in our Kubernetes cluster managed by Kube-OVN where a security group (SG) rule is configured to allow egress traffic from a specific pod to the DNS service at the Cluster IP 10.96.0.10. According to our configuration, this rule should permit all traffic to the DNS service. However, we are observing unexpected behavior with different protocols:
- When accessing the DNS service using UDP (e.g., with a standard DNS query), the traffic passes without any issues, which is the expected behavior.
- Conversely, when we attempt to access the DNS service using TCP (e.g., using dig +tcp), the traffic is blocked, which contradicts our SG rule configuration.
Our sg looks like:
apiVersion: kubeovn.io/v1
kind: SecurityGroup
metadata:
creationTimestamp: "2024-05-07T09:01:09Z"
generation: 30
name: user-8281-sg
resourceVersion: "645915870"
uid: cfeb4eea-18fb-4d65-9b89-8befd946dd3e
spec:
allowSameGroupTraffic: true
egressRules:
- ipVersion: ipv4
policy: allow
priority: 30
protocol: all
remoteAddress: 10.96.0.10
remoteType: address
- ipVersion: ipv4
policy: deny
priority: 31
protocol: all
remoteAddress: 10.0.0.0/8
remoteType: address
- ipVersion: ipv4
policy: allow
priority: 200
protocol: all
remoteAddress: 0.0.0.0/0
remoteType: address
And if we add one more rule for the pod IP () behind dns service (10.96.0.10)
- ipVersion: ipv4
policy: allow
priority: 30
protocol: all
remoteAddress: 10.16.41.31
remoteType: address
The dns will work again.
I think the real problem is not related to dns. If there are other pod ip behind service ip, you set allow rules only for service ip. It seems simply not working. You have to set allow rules for pod ip too.
Steps To Reproduce
Create sg like following:
apiVersion: kubeovn.io/v1
kind: SecurityGroup
metadata:
creationTimestamp: "2024-05-07T09:01:09Z"
generation: 30
name: user-8281-sg
resourceVersion: "645915870"
uid: cfeb4eea-18fb-4d65-9b89-8befd946dd3e
spec:
allowSameGroupTraffic: true
egressRules:
- ipVersion: ipv4
policy: allow
priority: 30
protocol: all
remoteAddress: 10.96.0.10 (dns service cluster ip)
remoteType: address
- ipVersion: ipv4
policy: deny
priority: 31
protocol: all
remoteAddress: 10.0.0.0/8
remoteType: address
- ipVersion: ipv4
policy: allow
priority: 200
protocol: all
remoteAddress: 0.0.0.0/0
remoteType: address
bind it to some pod.
It's very intereting that you can ping and even dig rds-3r4ybkarqxwg-pxc.user-1993.svc.cluster.local srv
successfuly.
Current Behavior
Cannot access dns without adding pod ip to sg.
Expected Behavior
Can access dns with only service ip in sg.
from-lport 2270 (inport == @ovn.sg.user.8281.sg && ip4 && ip4.dst == 10.100.27.20) allow-related log(severity=info)
from-lport 2270 (inport == @ovn.sg.user.8281.sg && ip4 && ip4.dst == 10.16.61.90) allow-related log(severity=info)
And the command I run is:
curl 10.100.27.20
As you can see, from ACL's perspective, only the SYNC TCP packet's dst ip is 10.100.27.20
, which is the cluster ip. All the following TCP packet's dst ip is somehow converted to 10.16.61.90
which is the pod ip.
Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.
Any update on this issue?
Any update on this issue?
Sorry, too busy to fix this.
you are using default vpc ovn-cluster ?
how about setting ENABLE_LB false ?
you are using default vpc ovn-cluster ?
how about setting ENABLE_LB false ? Yep. Default vpc.
We have find another way to work around. Haha. I just comment to remind you there is a issue, maybe you can check it out when you have time. It seems github will automatically close this issue if I don't.
In my opinion:
when enabling lb, the vip could nated as its backend IP by switch lb. so the traffic blocked.
if you disable lb, the traffic to the VIP will go through the node, nated by ipvs
Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.