flannel icon indicating copy to clipboard operation
flannel copied to clipboard

Flanneld should stop Subnet Manager after SimpleNetwork backend initialized

Open xh4n3 opened this issue 1 year ago • 4 comments

Expected Behavior

AFAIK, when flanneld uses SimpleNetwork backend, it won't handle events from kube subnetManager after the backend initialized. So the expensive node listwatcher becomes useless.

Current Behavior

The node listwatcher keeps running and outputs node leases to a channel where nobody cares.

Possible Solution

We may should stop it after initialized. Add a check in the main.go, if the backend is a SimpleNetwork, call the cancel function. The context variable can be used to cancel it, see https://github.com/flannel-io/flannel/blob/v0.19.0/subnet/kube/kube.go#L119.

Steps to Reproduce (for bugs)

Run a flanneld with alloc backend.

Context

The list watch for all nodes can be expensive for large clusters, I'm trying to eliminate those useless api calls.

Your Environment

  • Flannel version: 0.15.1
  • Backend used: alloc
  • Kubernetes version: 1.22
  • Operating System and version: CentOS 7

I'm happy to submit a PR about this, please leave comments if any. Thanks!

xh4n3 avatar Jul 20 '22 08:07 xh4n3

Are you referring to all the backends that are using SimpleNetwork? Because it doesn't seem to me that there are any SimpleNetwork backend there is an object backend.SimpleNetwork that is referred by almost every backend. Maybe I am not getting it right.

rbrtbnfgl avatar Jul 20 '22 13:07 rbrtbnfgl

@rbrtbnfgl I'm referring the backends that returns instance of backend.SimpleNetwork directly, e.g. alloc, awsvpc. https://github.com/flannel-io/flannel/blob/master/backend/alloc/alloc.go#L52

Other backends returns a wrapper of SimpleNetwork or RouteNetwork, should keep the subnet manager running.

xh4n3 avatar Jul 20 '22 13:07 xh4n3

Ok. I got it. The backends are the ones related to the public cloud that we are considering to remove them because are not updated and I am not sure that they still works. It will be related only for the alloc backend. I am trying to understand if you do the cancel of the watcher it doesn't impact the insertion of a new node.

rbrtbnfgl avatar Jul 20 '22 15:07 rbrtbnfgl

@rbrtbnfgl Exactly, we're using the alloc backend, it simply does the interface setup things, and the cloud controller manager would setup the route table on the clouds for us. I don't think the alloc does anything beyond that, please let me know if I was wrong.

The SimpleNetwork won't consume the event of new nodes, as you can see here https://github.com/flannel-io/flannel/blob/master/backend/simple_network.go#L35 , compared to the RouteNetwork https://github.com/flannel-io/flannel/blob/master/backend/route_network.go#L52

xh4n3 avatar Jul 21 '22 01:07 xh4n3

Here is what I've changed:

  1. if the alloc is used, set a disableNodeInformer flag
  2. if disableNodeInformer is true, flannel will not initialize the node informer
  3. when backend calls kubeSubnetManager, if kubeSubnetManager is true, use a GET method to fetch fresh data from kube apiserver, instead of from local cache

This change will cut the connections with kube apiservers forever, and save up to 80% of flanneld memory. If anyone think this is okay to merge, I will send an PR.

xh4n3 avatar Sep 26 '22 03:09 xh4n3

Thanks. It's ok from me you can create the PR and then we can give you more feedback.

rbrtbnfgl avatar Sep 27 '22 14:09 rbrtbnfgl

Closing this issue since #1656 has been merged.

xh4n3 avatar Oct 14 '22 02:10 xh4n3