flannel
flannel copied to clipboard
Flanneld should stop Subnet Manager after SimpleNetwork backend initialized
Expected Behavior
AFAIK, when flanneld uses SimpleNetwork
backend, it won't handle events from kube subnetManager
after the backend initialized. So the expensive node listwatcher becomes useless.
Current Behavior
The node listwatcher keeps running and outputs node leases to a channel where nobody cares.
Possible Solution
We may should stop it after initialized.
Add a check in the main.go
, if the backend is a SimpleNetwork, call the cancel function. The context variable can be used to cancel it, see https://github.com/flannel-io/flannel/blob/v0.19.0/subnet/kube/kube.go#L119.
Steps to Reproduce (for bugs)
Run a flanneld with alloc backend.
Context
The list watch for all nodes can be expensive for large clusters, I'm trying to eliminate those useless api calls.
Your Environment
- Flannel version: 0.15.1
- Backend used: alloc
- Kubernetes version: 1.22
- Operating System and version: CentOS 7
I'm happy to submit a PR about this, please leave comments if any. Thanks!
Are you referring to all the backends that are using SimpleNetwork
? Because it doesn't seem to me that there are any SimpleNetwork
backend there is an object backend.SimpleNetwork
that is referred by almost every backend.
Maybe I am not getting it right.
@rbrtbnfgl I'm referring the backends that returns instance of backend.SimpleNetwork directly, e.g. alloc, awsvpc. https://github.com/flannel-io/flannel/blob/master/backend/alloc/alloc.go#L52
Other backends returns a wrapper of SimpleNetwork or RouteNetwork, should keep the subnet manager running.
Ok. I got it.
The backends are the ones related to the public cloud that we are considering to remove them because are not updated and I am not sure that they still works. It will be related only for the alloc
backend.
I am trying to understand if you do the cancel of the watcher it doesn't impact the insertion of a new node.
@rbrtbnfgl Exactly, we're using the alloc
backend, it simply does the interface setup things, and the cloud controller manager would setup the route table on the clouds for us. I don't think the alloc
does anything beyond that, please let me know if I was wrong.
The SimpleNetwork won't consume the event of new nodes, as you can see here https://github.com/flannel-io/flannel/blob/master/backend/simple_network.go#L35 , compared to the RouteNetwork https://github.com/flannel-io/flannel/blob/master/backend/route_network.go#L52
Here is what I've changed:
- if the
alloc
is used, set adisableNodeInformer
flag - if
disableNodeInformer
is true, flannel will not initialize the node informer - when backend calls
kubeSubnetManager
, ifkubeSubnetManager
is true, use a GET method to fetch fresh data from kube apiserver, instead of from local cache
This change will cut the connections with kube apiservers forever, and save up to 80% of flanneld memory. If anyone think this is okay to merge, I will send an PR.
Thanks. It's ok from me you can create the PR and then we can give you more feedback.
Closing this issue since #1656 has been merged.