amazon-vpc-cni-k8s icon indicating copy to clipboard operation
amazon-vpc-cni-k8s copied to clipboard

Add missing rules when NodePort support is disabled

Open antoninbas opened this issue 3 years ago • 15 comments

What type of PR is this?

bug

Which issue does this PR fix:

#2025

What does this PR do / Why do we need it:

  • the rules that need to be installed for NodePort support and SNAT support are very similar. The same traffic mark is needed for both. As a result, rules that are currently installed only when NodePort support is enabled should also be installed when external SNAT is disabled, which is the case by default.
  • remove "-m state --state NEW" from a rule in the nat table. This is always true for packets that traverse the nat table.
  • fix typo in one rule's name (extra whitespace).

Testing done on this change:

  • Unit testing
  • Tested manually by disabling NodePort support and adding the missing rule.
  • Tested with custom container image for amazon-k8s-cni and amazon-k8s-cni-init

Automation added to e2e:

Will this PR introduce any new dependencies?: No

Will this break upgrades or downgrades. Has updating a running cluster been tested?: Should not break upgrades or downgrade (may have some superfluous iptables rules). Not tested.

Does this change require updates to the CNI daemonset config files to work?: No

Does this PR introduce any user-facing change?: No, bug fix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Fixes #2025

Co-authored-by: Quan Tian [email protected]

Signed-off-by: Antonin Bas [email protected]

antoninbas avatar Jul 08 '22 20:07 antoninbas

I am happy to do more testing, once maintainers validate the approach

antoninbas avatar Jul 08 '22 20:07 antoninbas

@antoninbas, thanks for the fix. Could you test the following?

  • upgrade from previous v1.10 to the one with your fix, verify nat PREROUTING rules are configured as expected

kishorj avatar Jul 13 '22 00:07 kishorj

@kishorj I did 2 upgrade tests, one with NodePort support enabled, and one with NodePort support disabled. Full iptables rules for nat and mangle tables are below.

There is only one issue. Because I removed the unnecessary -m state --state NEW matcher from a PREROUTING rule, the rule is now duplicated, and will be until the next Node restart:

# old rule
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
# new rule
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -j AWS-CONNMARK-CHAIN-0

I see 3 solutions, let me see which one you want:

  1. add some logic to remove the old rule
  2. rollback this change in my patch and keep the unnecessary matcher; it won't impact the correctness of the patch and the improved rule was just a "nice-to-have"
  3. keep the patch as it is, since the duplicated rule doesn't impact correctness. On new K8s clusters and after a Node restart, there will be no duplicates.
Full iptables rules

NodePort support enabled (default)

Default build (v1.10.1-eksbuild.1)

# nat
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-0
-A AWS-CONNMARK-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS CONNMARK CHAIN, VPC CIDR" -j AWS-CONNMARK-CHAIN-1
-A AWS-CONNMARK-CHAIN-1 -m comment --comment "AWS, CONNMARK" -j CONNMARK --set-xmark 0x80/0x80
-A AWS-SNAT-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-1
-A AWS-SNAT-CHAIN-1 ! -o vlan+ -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 192.168.14.191 --random-fully

# mangle 
-A PREROUTING -i eth0 -m comment --comment "AWS, primary ENI" -m addrtype --dst-type LOCAL --limit-iface-in -j CONNMARK --set-xmark 0x80/0x80
-A PREROUTING -i eni+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A PREROUTING -i vlan+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80

After upgrade to patched build

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-0
-A AWS-CONNMARK-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS CONNMARK CHAIN, VPC CIDR" -j AWS-CONNMARK-CHAIN-1
-A AWS-CONNMARK-CHAIN-1 -m comment --comment "AWS, CONNMARK" -j CONNMARK --set-xmark 0x80/0x80
-A AWS-SNAT-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-1
-A AWS-SNAT-CHAIN-1 ! -o vlan+ -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 192.168.14.191 --random-fully

# mangle
-A PREROUTING -i eth0 -m comment --comment "AWS, primary ENI" -m addrtype --dst-type LOCAL --limit-iface-in -j CONNMARK --set-xmark 0x80/0x80
-A PREROUTING -i eni+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A PREROUTING -i vlan+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80

NodePort support disabled

Default build (v1.10.1-eksbuild.1)

# nat
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-0
-A AWS-CONNMARK-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS CONNMARK CHAIN, VPC CIDR" -j AWS-CONNMARK-CHAIN-1
-A AWS-CONNMARK-CHAIN-1 -m comment --comment "AWS, CONNMARK" -j CONNMARK --set-xmark 0x80/0x80
-A AWS-SNAT-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-1
-A AWS-SNAT-CHAIN-1 ! -o vlan+ -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 192.168.14.191 --random-fully

# mangle
# empty

After upgrade to patched build

# nat
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-0
-A AWS-CONNMARK-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS CONNMARK CHAIN, VPC CIDR" -j AWS-CONNMARK-CHAIN-1
-A AWS-CONNMARK-CHAIN-1 -m comment --comment "AWS, CONNMARK" -j CONNMARK --set-xmark 0x80/0x80
-A AWS-SNAT-CHAIN-0 ! -d 192.168.0.0/16 -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-1
-A AWS-SNAT-CHAIN-1 ! -o vlan+ -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 192.168.14.191 --random-fully

# mangle
-A PREROUTING -i eni+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80

antoninbas avatar Jul 13 '22 22:07 antoninbas

@kishorj friendly ping for this

antoninbas avatar Jul 19 '22 21:07 antoninbas

3. keep the patch as it is, since the duplicated rule doesn't impact correctness. On new K8s clusters and after a Node restart, there will be no duplicates.

This option is acceptable to me

Option 1 is cleaner, it removes the old rule.

kishorj avatar Aug 16 '22 05:08 kishorj

networking.go:703 has the following log line - log.Debugf("host network setup: found potentially stale SNAT rule for chain %s: %v", chain, ruleSpec). I greatly appreciate if you can modify it as follows: log.Debugf("host network setup: found potentially stale rule for chain %s: %v", chain, ruleSpec)

kishorj avatar Aug 16 '22 21:08 kishorj

@kishorj I will update the PR with all your feedback some time this week

antoninbas avatar Aug 16 '22 22:08 antoninbas

@kishorj I updated this PR with the changes you requested (it's all in the most recent commit)

I also ran the requested tests on an EKS cluster.

  1. start with old DS

Here are the PREROUTING chains for the nat table:

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
  1. update to the new DS

Here are the PREROUTING chains for the nat table:

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -j AWS-CONNMARK-CHAIN-0
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80

The rule has been updated correctly.

  1. restart the DS Pod and check the logs
$ kubectl -n kube-system logs pod/aws-node-b88wv -c aws-node
{"level":"info","ts":"2022-08-19T22:36:08.333Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2022-08-19T22:36:08.334Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2022-08-19T22:36:08.350Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2022-08-19T22:36:08.352Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2022-08-19T22:36:10.363Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-08-19T22:36:12.370Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-08-19T22:36:12.391Z","caller":"entrypoint.sh","msg":"Copying config file ... "}
{"level":"info","ts":"2022-08-19T22:36:12.395Z","caller":"entrypoint.sh","msg":"Successfully copied CNI plugin binary and config file."}
{"level":"info","ts":"2022-08-19T22:36:12.397Z","caller":"entrypoint.sh","msg":"Foregrounding IPAM daemon ..."}
$ kubectl -n kube-system logs pod/aws-node-b88wv -c aws-vpc-cni-init
Copying CNI plugin binaries ...
Configure rp_filter loose...
net.ipv4.conf.eth0.rp_filter = 2
2
net.ipv4.tcp_early_demux = 1
CNI init container done

antoninbas avatar Aug 19 '22 22:08 antoninbas

/lgtm

kishorj avatar Aug 19 '22 22:08 kishorj

@kishorj I see that the CI tests are not running. I think you may have to approve that on your side since I am a first time contributor to this repo.

antoninbas avatar Aug 22 '22 17:08 antoninbas

@antoninbas - We have disabled the CI tests for now. I will discuss with @kishorj and see if we can run it manually before merge.

jayanthvn avatar Aug 22 '22 17:08 jayanthvn

@antoninbas - We have disabled the CI tests for now. I will discuss with @kishorj and see if we can run it manually before merge.

Ok. For what it's worth, I have run the unit tests locally already (with make docker-unit-tests).

antoninbas avatar Aug 22 '22 17:08 antoninbas

@antoninbas - We have disabled the CI tests for now. I will discuss with @kishorj and see if we can run it manually before merge.

Ok. For what it's worth, I have run the unit tests locally already (with make docker-unit-tests).

Thank you :) But we will have to also run the integration tests. I will see if we can manually trigger it for now :)

jayanthvn avatar Aug 22 '22 17:08 jayanthvn

Sorry for the delay, we had disabled the GitHub runners. Now it is up and running, I have submitted for integration tests - https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3139125467

/cc @kishorj

jayanthvn avatar Sep 27 '22 21:09 jayanthvn

2 attempts of Upstream conformance tests are failing - https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3139125467. I can look into it next week.

jayanthvn avatar Sep 28 '22 18:09 jayanthvn

I ran the conformance tests locally (after updating the branch), and can confirm that all the tests were passing. I will re-run the tests in Github.

git log
commit 85e2ef5ac82dfd3f79f2a5b05ec52f485de5a0c0 (HEAD -> add-missing-rules-when-node-port-support-is-disabled)

Ran 19 of 5771 Specs in 1065.502 seconds
SUCCESS! -- 19 Passed | 0 Failed | 0 Flaked | 0 Pending | 5752 Skipped
PASS

sushrk avatar Oct 31 '22 21:10 sushrk

Integration tests in progress https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3364855441/jobs/5579689661

sushrk avatar Oct 31 '22 22:10 sushrk

The latest run of the integration tests have passed: https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/3364855441/jobs/5582884522

sushrk avatar Nov 01 '22 19:11 sushrk