David (Mengqi) Yu

Results 81 comments of David (Mengqi) Yu

>why does the LB controller need to contact the etcd server? Every k8s controller that uses leader election relies on apiserver to elect leader and renew lease. APIServer uses etcd...

@tamalsaha Is this issue fixed in your fork?

The inflate-helm-chart examples have been updated. They can be found in https://github.com/GoogleContainerTools/kpt-functions-catalog/tree/master/examples/contrib/inflate-helm-chart

In https://github.com/kubernetes/kubernetes/issues/123072, we saw the following: - etcd memory kept increasing during the incident - there were clients making large number of watch request w/o resourceVersion e.g. `/api/v1/watch/pods?fieldSelector=status.phase!=Failed,status.phase!=Unknown,status.phase!=Succeeded,spec.nodeName=ip-10-32-88-156.ec2.internal&pretty=false` - `apiserver_watch_cache_events_received_total{resource="pods"}`...

This issue is highly correlated to the `PrevKv=nil` error. A sample message: `E0129 20:22:12.231725 11 watcher.go:253] watch chan error: etcd event received with PrevKv=nil (key="/registry/pods/foo/bar", modRevision=8143031328, type=PUT)` FWIW we found...

etcd version is 3.5.10.

>Do you see any asymmetry between apiservers reporting apiserver_watch_cache_events_dispatched_total metric? We no longer have such information, because we have to restart the APIServers in the customer's cluster to mitigate the...