kube-ovn icon indicating copy to clipboard operation
kube-ovn copied to clipboard

[BUG] TCP DNS Traffic Blocked Despite Security Group Rule Allowing Egress to DNS Service

Open wfnuser opened this issue 9 months ago • 7 comments

Kube-OVN Version

v1.12.8

Kubernetes Version

Server Version: v1.26.9

Operation-system/Kernel Version

"Ubuntu 22.04.2 LTS"

Description

We have encountered an issue in our Kubernetes cluster managed by Kube-OVN where a security group (SG) rule is configured to allow egress traffic from a specific pod to the DNS service at the Cluster IP 10.96.0.10. According to our configuration, this rule should permit all traffic to the DNS service. However, we are observing unexpected behavior with different protocols:

  • When accessing the DNS service using UDP (e.g., with a standard DNS query), the traffic passes without any issues, which is the expected behavior.
  • Conversely, when we attempt to access the DNS service using TCP (e.g., using dig +tcp), the traffic is blocked, which contradicts our SG rule configuration.

Our sg looks like:

apiVersion: kubeovn.io/v1
kind: SecurityGroup
metadata:
  creationTimestamp: "2024-05-07T09:01:09Z"
  generation: 30
  name: user-8281-sg
  resourceVersion: "645915870"
  uid: cfeb4eea-18fb-4d65-9b89-8befd946dd3e
spec:
  allowSameGroupTraffic: true
  egressRules:
  - ipVersion: ipv4
    policy: allow
    priority: 30
    protocol: all
    remoteAddress: 10.96.0.10
    remoteType: address
  - ipVersion: ipv4
    policy: deny
    priority: 31
    protocol: all
    remoteAddress: 10.0.0.0/8
    remoteType: address
  - ipVersion: ipv4
    policy: allow
    priority: 200
    protocol: all
    remoteAddress: 0.0.0.0/0
    remoteType: address

And if we add one more rule for the pod IP () behind dns service (10.96.0.10)

  - ipVersion: ipv4
    policy: allow
    priority: 30
    protocol: all
    remoteAddress: 10.16.41.31
    remoteType: address

The dns will work again.

I think the real problem is not related to dns. If there are other pod ip behind service ip, you set allow rules only for service ip. It seems simply not working. You have to set allow rules for pod ip too.

Steps To Reproduce

Create sg like following:

apiVersion: kubeovn.io/v1
kind: SecurityGroup
metadata:
  creationTimestamp: "2024-05-07T09:01:09Z"
  generation: 30
  name: user-8281-sg
  resourceVersion: "645915870"
  uid: cfeb4eea-18fb-4d65-9b89-8befd946dd3e
spec:
  allowSameGroupTraffic: true
  egressRules:
  - ipVersion: ipv4
    policy: allow
    priority: 30
    protocol: all
    remoteAddress: 10.96.0.10 (dns service cluster ip)
    remoteType: address
  - ipVersion: ipv4
    policy: deny
    priority: 31
    protocol: all
    remoteAddress: 10.0.0.0/8
    remoteType: address
  - ipVersion: ipv4
    policy: allow
    priority: 200
    protocol: all
    remoteAddress: 0.0.0.0/0
    remoteType: address

bind it to some pod.

image

It's very intereting that you can ping and even dig rds-3r4ybkarqxwg-pxc.user-1993.svc.cluster.local srv successfuly.

Current Behavior

Cannot access dns without adding pod ip to sg.

Expected Behavior

Can access dns with only service ip in sg.

wfnuser avatar May 09 '24 14:05 wfnuser

image This is the logs for acl rules :

from-lport  2270 (inport == @ovn.sg.user.8281.sg && ip4 && ip4.dst == 10.100.27.20) allow-related log(severity=info)
from-lport  2270 (inport == @ovn.sg.user.8281.sg && ip4 && ip4.dst == 10.16.61.90) allow-related log(severity=info)

And the command I run is:

curl 10.100.27.20

As you can see, from ACL's perspective, only the SYNC TCP packet's dst ip is 10.100.27.20, which is the cluster ip. All the following TCP packet's dst ip is somehow converted to 10.16.61.90 which is the pod ip.

wfnuser avatar May 09 '24 15:05 wfnuser

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

github-actions[bot] avatar Jul 10 '24 00:07 github-actions[bot]

Any update on this issue?

wfnuser avatar Jul 10 '24 07:07 wfnuser

Any update on this issue?

Sorry, too busy to fix this.

bobz965 avatar Jul 10 '24 08:07 bobz965

you are using default vpc ovn-cluster ?

how about setting ENABLE_LB false ?

bobz965 avatar Jul 10 '24 08:07 bobz965

you are using default vpc ovn-cluster ?

how about setting ENABLE_LB false ? Yep. Default vpc.

We have find another way to work around. Haha. I just comment to remind you there is a issue, maybe you can check it out when you have time. It seems github will automatically close this issue if I don't.

wfnuser avatar Jul 11 '24 10:07 wfnuser

In my opinion:

when enabling lb, the vip could nated as its backend IP by switch lb. so the traffic blocked.

if you disable lb, the traffic to the VIP will go through the node, nated by ipvs

bobz965 avatar Jul 11 '24 11:07 bobz965

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

github-actions[bot] avatar Sep 10 '24 00:09 github-actions[bot]