image-builder
image-builder copied to clipboard
updating dynamic port range for windows to address some collisions
Signed-off-by: Mark Rossetti [email protected]
What this PR does / why we need it: This fixes some issues where ports inside the current dynamic range (33000-65535) are being used by Windows components. This can random issues if container workloads are allocated ports that are already in use. Lots of details in https://github.com/kubernetes/kubernetes/issues/107114
Which issue(s) this PR fixes (optional, in fixes #
Additional context Add any other context for the reviewers
/sig windows
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: marosset
Once this PR has been reviewed and has the lgtm label, please assign jsturtevant for approval by writing /assign @jsturtevant
in a comment. For more information see:The Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
@marosset: The following test failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
pull-azure-vhds | 5b3b6901f623a59dfbeb8f9686d7387c5d0eb9c7 | link | true | /test pull-azure-vhds |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
@marosset This goss check need to be modified as well
@knabben @jsturtevant any thoughts on the new port ranges?
@marosset can you pls address the goss failures, we can get this merged.
this should resolve some flakes we are seeing in our upstream tests. FYI @daschott
@jsturtevant thanks for the ping here. I was looking back at your concern here: https://github.com/kubernetes/kubernetes/issues/107114#issuecomment-1122654452
This looks great if you are not experiencing port exhaustion. But if you are running out of ports due to heavy traffic load it might also make sense to define exclusion zone 30000 - 34000 and expand the range as needed (e.g. all the way down to 16385). Another alternative is to disable the SNAT, this would also reduce number of ports consumed. But if you are not experiencing any port exhaustion in the e2e test clusters, then please ignore :)
@jsturtevant thanks for the ping here. I was looking back at your concern here: kubernetes/kubernetes#107114 (comment)
This looks great if you are not experiencing port exhaustion. But if you are running out of ports due to heavy traffic load it might also make sense to define exclusion zone 30000 - 34000 and expand the range as needed (e.g. all the way down to 16385). Another alternative is to disable the SNAT, this would also reduce number of ports consumed. But if you are not experiencing any port exhaustion in the e2e test clusters, then please ignore :)
From what I've seen (https://github.com/kubernetes/kubernetes/issues/107114#issuecomment-1208116303), defining exclusion zones does not prevent those ports from being used, but rather raise errors if there is an attempt at using them. This is the reason we're seeing An attempt was made to access a socket in a way forbidden by its access permissions
errors in the Kubernetes CI signal, requests inside the containers are trying to use ports from the excluded ranges:
netsh int ipv4 show excludedportrange tcp
Protocol tcp Port Exclusion Ranges
Start Port End Port
---------- --------
5985 5985
5986 5986
33300 33363
33364 33427
33487 33550
33551 33614
33615 33678
33679 33742
47001 47001
* - Administered port exclusions.
IMO, that should have happened in the first place. Updating the port range to 34000 basically excludes ports from being used that are in the excludedportrange
above. The only one that would remain would be 47001
.
/retest
@marosset Need to update the goss tests:
https://github.com/kubernetes-sigs/image-builder/blob/1d4b0445b364712c3ab5d3c6f721b808fa0bfa2e/images/capi/packer/goss/goss-command.yaml#L158-L164
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: marosset
Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval by writing /assign @fabriziopandini
in a comment. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
Should we add the same port range for UDP (except shorten it for azure due to UDP restriction? https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-faq#what-protocols-can-i-use-within-vnets) Maybe we can follow up in a seperate PR since we don't have UDP and would need to only do this port exclusion for Azure....
@jsturtevant thanks for the ping here. I was looking back at your concern here: https://github.com/kubernetes/kubernetes/issues/107114#issuecomment-1122654452
This looks great if you are not experiencing port exhaustion. But if you are running out of ports due to heavy traffic load it might also make sense to define exclusion zone 30000 - 34000 and expand the range as needed (e.g. all the way down to 16385). Another alternative is to disable the SNAT, this would also reduce number of ports consumed. But if you are not experiencing any port exhaustion in the e2e test clusters, then please ignore :)
From what I've seen (https://github.com/kubernetes/kubernetes/issues/107114#issuecomment-1208116303), defining exclusion zones does not prevent those ports from being used, but rather raise errors if there is an attempt at using them. This is the reason we're seeing An attempt was made to access a socket in a way forbidden by its access permissions errors in the Kubernetes CI signal, requests inside the containers are trying to use ports from the excluded ranges:
@daschott Any ideas on why port exclusion doesn't seem to work in our case? It looks like your suggestion of going all the way down to 16385 is what AKS is doing https://github.com/Azure/AgentBaker/blob/baaf54f3b2342d5f41ae76ecd09b082487898063/staging/cse/windows/configfunc.ps1#L116-L133 and I would expect they would see the same thing @claudiubelu is seeing in https://github.com/kubernetes/kubernetes/issues/107114#issuecomment-1208116303
@jsturtevant Is the app picking a random port from the ephemeral port range, or is there some mechanism to define a port range in your application?
There is a script aka.ms/networkhealth which can be used to view how many port ranges are available, do your machines have enough ports available? How often does this issue occur?
do your machines have enough ports available? How often does this issue occur?
I wouldn't think there is that many connections running in the test environment (@claudiubelu any thoughts on that?). It occurs every 30-40 runs.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: jsturtevant, marosset
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~images/capi/OWNERS~~ [jsturtevant]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment