client-go icon indicating copy to clipboard operation
client-go copied to clipboard

High memory when informer error occurs.

Open zhulinwei opened this issue 5 months ago • 0 comments

I have a custrom controller that useing infomer. This controller will list and watch more than 2000 nodes and 50000 pods, api, apimachinery, client-go all at v0.24.0.

Code sample:

func main() {
	cfg, err := clientcmd.BuildConfigFromFlags("", homedir.HomeDir()+"/.kube/config")
	if err != nil {
		glog.Fatalf("1 error building kubernetes config:%v", err)
	}
	kubeClient, err := kubernetes.NewForConfig(cfg)
	if err != nil {
		glog.Fatalf("2 error building kubernetes config:%v", err)
	}
	factory := informers.NewSharedInformerFactory(kubeClient, 0)
	podInformer := factory.Core().V1().Pods().Informer()
	podInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
		AddFunc: func(obj interface{}) {
			// do something
		},
		UpdateFunc: func(oldObj, newObj interface{}) {
			// do something
		},
		DeleteFunc: func(obj interface{}) {
			// do something
		},
	})
	factory.Start(make(chan struct{}))
	factory.WaitForCacheSync(make(chan struct{}))
}

Normally only 800MB of memory is needed:

Snipaste_2024-09-20_10-26-01

But when an error occurs, the momory will be doubuled instantly, than decrease slightly, but still be higher than the memory used before the error.

Snipaste_2024-09-20_10-31-09

W0920 04:58:56.453165 1 reflector.go:442] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: got short buffer with n=0, base=4092, cap=40960") has prevented the request from succeeding

W0920 04:58:43.401539 1 reflector.go:442] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: got short buffer with n=0, base=882, cap=20480") has prevented the request from succeeding

Memory used after error occurs: Snipaste_2024-09-20_10-34-30

As I understand, when a network anomaly occurs, informer will re-pull the full configuration of above resources from kube-apiserver. At this time, because old and new resources object exist at the same time, the memory will surge, then gc will be performed after a period of time to recycle old resources object, and the memory will fall back. But I don't understand why the meomry would be more than before error occurred.

Is this a bug? How can i fix it? What should I do?

zhulinwei avatar Sep 20 '24 02:09 zhulinwei