one cluster should have only one transport to reduce the number of TCP connections
What type of PR is this?
What this PR does / why we need it: It can solve the problem about too many tcp connection between aggregated apiserver and apiserver of member's cluster Which issue(s) this PR fixes: Fixes #5574
Special notes for your reviewer:
test.sh
KUBECONFIG="/root/.kube/karmada.config"
SLEEP_INTERVAL=0.1
MAX_JOBS=100
function run_karmadactl() {
for ((i = 1; i <= 50; i++)); do
karmadactl --kubeconfig "$KUBECONFIG" get node --operation-scope=members
sleep "$SLEEP_INTERVAL"
done
}
for ((i = 1; i <= MAX_JOBS; i++)); do
run_karmadactl &
done
wait
result.sh
#!/bin/bash
while true
do
tcp_count=$(netstat -anp | grep 6443| wc -l)
sleep 1
echo "$(date '+%Y-%m-%d %H:%M:%S') - Current total TCP connections: $((tcp_count))"
done
fix before
fix after
Welcome @lcw2! It looks like this is your first PR to karmada-io/karmada 🎉
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Codecov Report
Attention: Patch coverage is 68.96552% with 9 lines in your changes missing coverage. Please review.
Project coverage is 38.28%. Comparing base (
58612d3) to head (884052c). Report is 76 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| pkg/util/proxy/proxy.go | 68.96% | 5 Missing and 4 partials :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## master #5615 +/- ##
==========================================
+ Coverage 35.20% 38.28% +3.07%
==========================================
Files 645 649 +4
Lines 44869 45160 +291
==========================================
+ Hits 15795 17288 +1493
+ Misses 27844 26559 -1285
- Partials 1230 1313 +83
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 38.28% <68.96%> (+3.07%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
two questions:
-
when ProxyURL and ProxyHeader of cluster changed, how to handle transport ?
store ProxyURL and ProxyHeader in clusterEndpointInfo, when a request comes in, compare the current values with the stored ones. If those fields have changed, create a new transport ? or other idea? -
when cluster is deleted, should the transport of the cluster also be deleted? If deletion is required, can this be archieved by periodically list cluster? 30s? any suggestion? @whitewindmills @chaunceyjiang @chaunceyjiang
I just feel like this approach seems too complicated and easy to go wrong. can we consider using all key fields(such as APIEndpoint, ProxyURL ...) to generate a hash value as the clusterEndpointInfoStore key? and we might do not need to remove it when cluster is deleted.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign chaunceyjiang for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/cc @whitewindmills @chaunceyjiang @yanfeng1992 Hi, guys, can you help take a review again?
have you tested it yet? I thought of a scenario: when the cluster goes offline, the connection will be automatically disconnected after a period of time. when the cluster comes back online, can it work fine?
I stop the apiserver two minutes ,and restart the apiserver, karmadactl get nodes work fine.
I'm not sure if you built the cache before stoping apiserver. maybe we haven't considered it comprehensively enough. can we talk about it in the next community meeting?
I'm not sure if you built the cache before stoping apiserver. maybe we haven't considered it comprehensively enough. can we talk about it in the next community meeting?
OK
As discussed at the community meeting 2024-10-22, we are going to disable keep alive as this is a simpler way to avoid connection exhaust issue.
We are still open to discussion on the approach that share transport, as long as it can handle the certificate rotation concerns.
/close
@RainbowMango: Closed this PR.
In response to this:
As discussed at the community meeting 2024-10-22, we are going to disable keep alive as this is a simpler way to avoid connection exhaust issue.
We are still open to discussion on the approach that share transport, as long as it can handle the certificate rotation concerns.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.