amazon-vpc-cni-k8s Improve VPC CNI memory by reducing number of things it is caching

What would you like to be added: Narrow down what VPC CNI is caching to reduce memory utilization in large clusters. Currently we see pretty high memory utilization and seems to scale with nodes.

Why is this needed: The current behavior is pretty problematic when cluster size gets large (5000+ nodes) causing us to increase memory request for the CNI even though it isn't necessary for the CNI to use all that memory.

I believe simply adding a new ByObject Filter on Node object is enough to reduce cache utilization. Since https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/k8sapi/k8sutils.go#L183 only gets the node object the CNI is running on.

Apr 18 '24 20:04 GnatorX

Thanks for the report and the Pull Request. Have you done any measurements with and without this change? Could you share the differences?

Apr 19 '24 03:04 orsenthil

Not yet. Will update once I have tested this

Apr 19 '24 03:04 GnatorX

@orsenthil I am wondering if it make sense to even cache nodes. K8s caches which usesList + watches on startup are extremely expensive calls. The CNI only cares about the node it is running on and calls with node name is index from k8s side which is relatively fast. Rather than filtering, why not just use non-cached calls get that information?

The availability difference isn't that high, watches vs a call.

Apr 19 '24 17:04 GnatorX

I took a pprof of the issue. Screenshot 2024-04-29 at 10 30 58 AM

It seems like the issue is with the stream watcher is consuming memory during cluster size increase. It seems to require quite a bit of memory in order to process all nodes and store it in the memory. Even though the memory consumption isn't very high, its still unnecessary to store all node information in cache.

I need to re-test this with my change however I do believe the real solution is to avoid performing list watch against all nodes and only watch for node events specific to the CNI.

Apr 29 '24 18:04 GnatorX

K8s caches which usesList + watches on startup are extremely expensive calls

Even though the memory consumption isn't very high, its still unnecessary to store all node information in cache.

I do believe the real solution is to avoid performing list watch against all nodes and only watch for node events specific to the CNI.

It is pretty standard for k8s client calls to use the cached client. It will be good to measure difference in the memory usage and the performance of the various operations in the large clusters before we decide to not use the cache.

With your changes, if you see any different in both memory and performance, please share an update here.

May 01 '24 20:05 orsenthil

It is pretty standard for k8s client calls to use the cached client. It will be good to measure difference in the memory usage and the performance of the various operations in the large clusters before we decide to not use the cache.

Agreed.

When I tested my changes, it didn't yield significant difference in memory utilization. I believe, as shown in the pprof, the memory usage is because of the stream watcher attempting unmarshal incoming data. I think rather than using a informer cache and raw watch against the node itself may be more efficient(?).

I can close to issue for now since I likely don't have time to look into writing a direct watcher instead and I think the memory spike isn't large enough to be a concern.

May 01 '24 20:05 GnatorX

I can close to issue for now since I likely don't have time to look into writing a direct watcher instead and I think the memory spike isn't large enough to be a concern.

This sounds reasonable, if we have better proof with improvements, we can bring this change in.

Jun 26 '24 20:06 orsenthil

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one.

Jun 26 '24 20:06 github-actions[bot]