Garvin Pang

Results 78 comments of Garvin Pang

Sorry, I updated the issue with the root cause we found. Node controller starting before cache sync was incorrect.

You can test this by forcing page limit to 1 and attempt to start VPC RC against a large cluster. If you didn't change etcd compaction interval (default is 5...

I think I hit this issue too. Let me circle back with some more info

Are you using pod security group for these pods? Its interesting to see that there isn't the `trunk-attached` label on the node and it feels similar to https://github.com/aws/karpenter-provider-aws/issues/1252

Its a bit more complicated than just adding the label. From what I can tell when we had similar issues, on node creation or pod creation the label must already...

Not yet. Will update once I have tested this

@orsenthil I am wondering if it make sense to even cache nodes. K8s caches which usesList + watches on startup are extremely expensive calls. The CNI only cares about the...

I took a pprof of the issue. ![Screenshot 2024-04-29 at 10 30 58 AM](https://github.com/aws/amazon-vpc-cni-k8s/assets/6425977/d5d4a776-825b-4508-a308-79446c928f4b) It seems like the issue is with the stream watcher is consuming memory during cluster size...

> It is pretty standard for k8s client calls to use the cached client. It will be good to measure difference in the memory usage and the performance of the...

https://github.com/aws/amazon-vpc-resource-controller-k8s/issues/188 seems related