bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

Memory utilization is higher in some EKS workloads

Open ElementTech opened this issue 2 years ago • 10 comments

Image I'm using:

AMI ID: ami-0f3b9574af04c5bf4 BottleRocket Release: bottlerocket-aws-k8s-1.28-x86_64-v1.16.0-d2d9cf87

What I expected to happen:

After switching from the standard Amazon Linux 2 AMI for EKS, we rolled out the nodes with BottleRocket. The workloads should have stayed with the same performance benchmark.

What actually happened:

We noticed some of the same workloads which we had before utilize much more memory than before and reach OOM. After making the memory limit larger, it set on a certain number, which was higher by about 1-1.5 GiB than the memory usage before.

How to reproduce the problem:

Deploy various kind of workloads firstly on any standard Amazon Linux 2 AMI on EKS, and then update the nodes to be Bottlerocket using a rolling update. Measure the workloads' memory utilization.

ElementTech avatar Nov 06 '23 19:11 ElementTech

@ElementTech thank for opening the issue! we are looking at it now, and we will reach back soon.

gthao313 avatar Nov 07 '23 00:11 gthao313

we are trying to reproduce the issue, can you share with us how you measure the memory utilization so we can use the same way to do the reproduce. Meanwhile, if you have any other details or examples that can share with us, it would be helpful for us to narrow down the issue! thank you!

gthao313 avatar Nov 07 '23 02:11 gthao313

@gthao313 I'd sugest taking a look at https://github.com/kubernetes/kubernetes/issues/118916.

stevehipwell avatar Nov 09 '23 17:11 stevehipwell

@stevehipwell Yes, that was it. I found it before and wanted to write it here. It appears as though BottleRocket obviously uses cgroup v2 by default, while the vanilla Amazon Linux 2 for EKS uses cgroup v1. It might be beneficial to write a note about it in the bottlerocket docs. Also, as a current fix, I added this to bottlerocket toml config:

[settings.boot]
reboot-to-reconcile = true

[settings.boot.init-parameters]
"systemd.unified_cgroup_hierarchy" = ["0"]

ElementTech avatar Nov 09 '23 17:11 ElementTech

Closing this as the underlying cause and work-around have been identified.

webern avatar Nov 13 '23 21:11 webern

@webern do you have a link to the fix?

stevehipwell avatar Nov 13 '23 22:11 stevehipwell

I misunderstood and didn't realize a fix was needed in Bottlerocket for this. Re-opening.

webern avatar Nov 13 '23 22:11 webern

@ElementTech Is that a java application?If yes, maybe you should look into the following links. https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/#migrate-to-cgroup-v2

hitsub2 avatar Nov 14 '23 05:11 hitsub2

@webern I think this should be fixed when the K8s patch for runc is backported and Bottlerockt bumps to that version. So it might only be fixed in some K8s versions.

stevehipwell avatar Nov 14 '23 17:11 stevehipwell

anyone else has experienced similar issues with java application running on AWS ECS?

kngjaime avatar Aug 16 '25 01:08 kngjaime