karmada one cluster should have only one transport to reduce the number of TCP connections

What type of PR is this?

What this PR does / why we need it: It can solve the problem about too many tcp connection between aggregated apiserver and apiserver of member's cluster Which issue(s) this PR fixes: Fixes #5574

Special notes for your reviewer:

test.sh

KUBECONFIG="/root/.kube/karmada.config"
SLEEP_INTERVAL=0.1
MAX_JOBS=100
function run_karmadactl() {
  for ((i = 1; i <= 50; i++)); do
    karmadactl  --kubeconfig "$KUBECONFIG" get node --operation-scope=members
    sleep "$SLEEP_INTERVAL"
  done
}
for ((i = 1; i <= MAX_JOBS; i++)); do
  run_karmadactl &
done
wait

result.sh

#!/bin/bash

while true
do
  tcp_count=$(netstat -anp | grep 6443| wc -l)
  sleep 1
  echo "$(date '+%Y-%m-%d %H:%M:%S') - Current total TCP connections: $((tcp_count))"
done

fix before 企业微信截图_f9386612-1d3f-402c-8aee-243149a67406

fix after 企业微信截图_ace06070-69d8-4391-853f-ba3a488df8f2

Sep 27 '24 09:09 lcw2

Welcome @lcw2! It looks like this is your first PR to karmada-io/karmada 🎉

Sep 27 '24 09:09 karmada-bot

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 68.96552% with 9 lines in your changes missing coverage. Please review.

Project coverage is 38.28%. Comparing base (58612d3) to head (884052c). Report is 76 commits behind head on master.

Files with missing lines	Patch %	Lines
pkg/util/proxy/proxy.go	68.96%	5 Missing and 4 partials :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5615      +/-   ##
==========================================
+ Coverage   35.20%   38.28%   +3.07%     
==========================================
  Files         645      649       +4     
  Lines       44869    45160     +291     
==========================================
+ Hits        15795    17288    +1493     
+ Misses      27844    26559    -1285     
- Partials     1230     1313      +83

Flag	Coverage Δ
unittests	`38.28% <68.96%> (+3.07%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Sep 27 '24 13:09 codecov-commenter

two questions:

when ProxyURL and ProxyHeader of cluster changed， how to handle transport ?
store ProxyURL and ProxyHeader in clusterEndpointInfo, when a request comes in, compare the current values with the stored ones. If those fields have changed, create a new transport ? or other idea?
when cluster is deleted， should the transport of the cluster also be deleted？ If deletion is required, can this be archieved by periodically list cluster? 30s? any suggestion? @whitewindmills @chaunceyjiang @chaunceyjiang

Oct 10 '24 06:10 lcw2

I just feel like this approach seems too complicated and easy to go wrong. can we consider using all key fields(such as APIEndpoint, ProxyURL ...) to generate a hash value as the clusterEndpointInfoStore key? and we might do not need to remove it when cluster is deleted.

Oct 11 '24 06:10 whitewindmills

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign chaunceyjiang for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Oct 13 '24 18:10 karmada-bot

/cc @whitewindmills @chaunceyjiang @yanfeng1992 Hi, guys, can you help take a review again?

Oct 15 '24 02:10 XiShanYongYe-Chang

have you tested it yet? I thought of a scenario: when the cluster goes offline, the connection will be automatically disconnected after a period of time. when the cluster comes back online, can it work fine?

I stop the apiserver two minutes ,and restart the apiserver, karmadactl get nodes work fine.

Oct 17 '24 08:10 lcw2

I'm not sure if you built the cache before stoping apiserver. maybe we haven't considered it comprehensively enough. can we talk about it in the next community meeting?

Oct 17 '24 09:10 whitewindmills

I'm not sure if you built the cache before stoping apiserver. maybe we haven't considered it comprehensively enough. can we talk about it in the next community meeting?

OK

Oct 17 '24 10:10 lcw2

As discussed at the community meeting 2024-10-22, we are going to disable keep alive as this is a simpler way to avoid connection exhaust issue.

We are still open to discussion on the approach that share transport, as long as it can handle the certificate rotation concerns.

/close

Nov 05 '24 08:11 RainbowMango

@RainbowMango: Closed this PR.

In response to this:

As discussed at the community meeting 2024-10-22, we are going to disable keep alive as this is a simpler way to avoid connection exhaust issue.

We are still open to discussion on the approach that share transport, as long as it can handle the certificate rotation concerns.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Nov 05 '24 08:11 karmada-bot