amazon-vpc-cni-k8s icon indicating copy to clipboard operation
amazon-vpc-cni-k8s copied to clipboard

iptables --random and --random-fully ignore /proc/sys/net/ipv4/ip_local_port_range

Open taylorb-syd opened this issue 5 years ago • 13 comments

When selecting a port for the outgoing connection the kernel netfliter module will select the next port avaliable in the /proc/sys/net/ipv4/ip_local_port_range range by default. However, due to issues with port conflicts occuring as discussed in #246 the default behaviour has been changed to use the NF_NAT_RANGE_PROTO_RANDOM flag by way of the iptables --random option.

It turns out that the kernel is hard coded[1] when either the NF_NAT_RANGE_PROTO_RANDOM or NF_NAT_RANGE_PROTO_RANDOM_FULLY to select a random port in the non-privileged range, i.e. 1024 through 65535.

It has been determined that it is possible to create port range limited rules provided that the protocol is specified within the rule[2]. For example the following rule could be applied to limit the range to the traditional range as specified in RFC6056 of 49152 through 65535 for TCP traffic.

iptables -I AWS-SNAT-CHAIN-${LCN} -m addrtype ! --dst-type LOCAL -j SNAT -p tcp --to-source ${FSTIP}:49152-65535

Where:

  • ${LCN} represents the number of IPv4 VPC CIDRs in the VPC.
  • ${FSTIP} represents the first IP address of the first ENI of the instance.

Similarly rules could be created for other protocols such as UDP.

In response to this, we have updated the documentation in update #503 to make this clearer, but the purpose of this issue to determine if this is sufficient for the consumers of this plugin.

The question I wish to ask of the community is: Are you happy with his behaviour being documented or should we put development effort towards being able to control this range?

To me this seems like a very niche use case, and affected consumers should consider manually injecting the required rules into their chains as per the example above, but if enough consumers are affected we can put effort into this.

References: [1] Line 478 of nf_nat_core.c [2] Line 78 of libip6t_SNAT.c

taylorb-syd avatar Jun 19 '19 04:06 taylorb-syd

From my observation, AL2 iptables version i.e., iptables v1.4.21 does not support --random-fully (AWS_VPC_K8S_CNI_RANDOMIZESNAT=prng environment variable), and is only compatible with iptables >=1.6.2.

nithu0115 avatar Jul 03 '19 23:07 nithu0115

The root cause of this is old kernel/packages in the EKS AMI. We have created an issue in the EKS AMI repo for tracking.

https://github.com/awslabs/amazon-eks-ami/issues/380

Closing this as duplicate (not related to AWS VPC plugin)

jaypipes avatar Dec 11 '19 19:12 jaypipes

This is not a duplicate of #380.

I'm sorry nithu0115@ but I think you linked to the wrong issue when talking about this, did you mean pull request #246 where this change was introduced?

taylorb-syd avatar Dec 11 '19 22:12 taylorb-syd

Yup, we goofed, sorry @taylorb-syd! It is #516 and #662 that are related to the awslabs/amazon-eks-ami#380

jaypipes avatar Dec 11 '19 22:12 jaypipes

What would be the canonical way to add these iptables rules? Where can we hook in to have these rules automagically created when the driver creates it's ruleset?

madeddie avatar Mar 05 '20 15:03 madeddie

What would be the canonical way to add these iptables rules? Where can we hook in to have these rules automagically created when the driver creates it's ruleset?

That's the point of this question, the agent currently provides no ability to do this. If this is functionality you would like to see (or similar functionality, i.e. the ability to provide custom rules) let us know and we can try and put some effort towards it.

Right now you need to write a script that checks for the existence of the AWS-SNAT-CHAIN-${LCN} and base external SNAT rule, and once it finds it, injects in the rule as specified in the first post.

May I understand your use case as to why you wish to restrict the port range?

taylorb-syd avatar Mar 08 '20 21:03 taylorb-syd

We have a cloud-wide ACL to protect ports between 1024 and 10240 from outside connections (possibly badly protected services) and so we need the NAT setup to choose ports between 10240 and 65535 instead of all the way down to 1024. I wouldn't mind injecting the rules manually, but in this case that would turn into something akin to cron in case the cni application messes with the rules for some reason. This'll become very brittle. I'd rather use a way that is signalled by the cni process writing the rules somehow to always inject ours when needed.

madeddie avatar Mar 08 '20 23:03 madeddie

With cloud-wide I mean the subnet encompassing our k8s cluster but also some non-dockerized workloads on standard VMs

madeddie avatar Mar 08 '20 23:03 madeddie

We have a cloud-wide ACL to protect ports between 1024 and 10240 from outside connections (possibly badly protected services) and so we need the NAT setup to choose ports between 10240 and 65535 instead of all the way down to 1024. I wouldn't mind injecting the rules manually, but in this case that would turn into something akin to cron in case the cni application messes with the rules for some reason. This'll become very brittle.

That seems like a definite use case. I am currently working on designing a better iptables rules management engine for the CNI plug-in and this work will lead into the ability to inject custom rules and add other functionality like SNAT port ranges.

Unfortunately it might be a while. For this reason it is unlikely to be in the 1.6.0 RCs and instead likely to be introduced in the 1.7.0 RCs instead. I'll update you on my progress.

taylorb-syd avatar Mar 09 '20 00:03 taylorb-syd

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Apr 13 '22 00:04 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Apr 27 '22 00:04 github-actions[bot]

/reopen

jayanthvn avatar Apr 27 '22 00:04 jayanthvn

I have come into this issue when trying to change a ACL to pass AWS CIS compliance.

AWS CIS 1.4.0 - Control 5.1:

5.1 Ensure no Network ACLs allow ingress from 0.0.0.0/0 to remote server administration ports (Automated)

The Network Access Control List (NACL) function provide stateless filtering of ingress and egress 
network traffic to AWS resources. 
It is recommended that no NACL allows unrestricted ingress access to remote server administration ports, 
such as SSH to port 22 and RDP to port 3389.

I need to be able to block port 3389 (RDP) to pass this CIS rule. But, because the ephemeral ports are on a range from 1024 to 65535 I can't do this without risking blocking a valid connection (when the ephemeral port end up being exactly 3389).

I figure that I could set AWS_VPC_K8S_CNI_RANDOMIZESNAT to none for this to work? But I don't what the implications of this would be. If there's any negative impact. Also, changing this env var for the CNI seems to be a problem when using Terraform since the AWS CNI is managed as a add-on and changing the env var alone is hard from Terraform, as the add-on content is not managed / owned by Terraform.

What else would I be able to do here?

Thanks.

michelzanini avatar Jun 17 '22 16:06 michelzanini

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] avatar Sep 21 '22 17:09 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Oct 06 '22 00:10 github-actions[bot]