Nick Baker comments

Results 30 comments of


                                            Nick Baker

[EKS] [bug]: Memory leak in eks-node-monitoring-agent container

@cilindrox are you seeing this only on a specific set of instances types?

[EKS] [bug]: Memory leak in eks-node-monitoring-agent container

We're following a similar trail, but haven't encountered conditions where it seemingly spikes right away as reported here. Is there anything with notable disk I/O running on failing nodes? You...

Update to docker 25

/ci +workflow:os_distros al2

How/where to Update default kubelete config - "containerLogMaxSize"

@khushboo121 are you still having any issues? You should be able to use kubelet flag `--container-log-max-size` on AL2 via [`--kubelet-extra-args`](https://github.com/awslabs/amazon-eks-ami/blob/38c24acb099d6823433cfce30ebf3736a041befe/templates/al2/runtime/bootstrap.sh#L34), or provide `containerLogMaxSize` as a value in the [`nodeadm` kubelet...

feat(nodeadm): pass nvidia gpu startup labels to kubelet

> Could we do the same for Neuron and possibly EFA > > ( on mobile so I can't direct link but see lines 83-84 https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/patterns/aws-neuron-efa/eks.tf ) thanks for the...

feat(nodeadm): pass nvidia gpu startup labels to kubelet

/ci +workflow:k8s_versions 1.33 +build enable_accelerator=nvidia enable_efa=true +test --instance-types=g4dn.xlarge

feat(nodeadm): pass nvidia gpu startup labels to kubelet

i pulled up the node logs from the ci and confirmed the labels ```bash # nodeadm logs Jul 07 19:16:23 localhost nodeadm[1961]: {"level":"info","ts":1751915783.9248972,"caller":"kubelet/config.go:212","msg":"Adding node label","label":"nvidia.com/gpu.present=true"} # kubelet logs Jul 07...

Nick Baker

[EKS] [bug]: Memory leak in eks-node-monitoring-agent container

[EKS] [bug]: Memory leak in eks-node-monitoring-agent container

Update to docker 25

How/where to Update default kubelete config - "containerLogMaxSize"

feat(nodeadm): pass nvidia gpu startup labels to kubelet

feat(nodeadm): pass nvidia gpu startup labels to kubelet

feat(nodeadm): pass nvidia gpu startup labels to kubelet

feat(nodeadm): pass nvidia gpu startup labels to kubelet

Circular dependency issue

feat(al2023/nvidia): update to 575.x drivers