Garvin Pang
Garvin Pang
Sorry, I updated the issue with the root cause we found. Node controller starting before cache sync was incorrect.
You can test this by forcing page limit to 1 and attempt to start VPC RC against a large cluster. If you didn't change etcd compaction interval (default is 5...
CNI not removing network built on a node after IP is lost externally and IPAMD reconciles this state
I think I hit this issue too. Let me circle back with some more info
Are you using pod security group for these pods? Its interesting to see that there isn't the `trunk-attached` label on the node and it feels similar to https://github.com/aws/karpenter-provider-aws/issues/1252
Its a bit more complicated than just adding the label. From what I can tell when we had similar issues, on node creation or pod creation the label must already...
Not yet. Will update once I have tested this
@orsenthil I am wondering if it make sense to even cache nodes. K8s caches which usesList + watches on startup are extremely expensive calls. The CNI only cares about the...
I took a pprof of the issue.  It seems like the issue is with the stream watcher is consuming memory during cluster size...
> It is pretty standard for k8s client calls to use the cached client. It will be good to measure difference in the memory usage and the performance of the...
https://github.com/aws/amazon-vpc-resource-controller-k8s/issues/188 seems related