amazon-eks-ami icon indicating copy to clipboard operation
amazon-eks-ami copied to clipboard

Use kernel 5.10

Open noony opened this issue 2 years ago • 14 comments

Issue #, if available: #857

Description of changes: Upgrade kernel version to 5.10 for kubernetes version above ~~1.19~~ 1.22. It's useful for wireguard transparent encryption, it also includes performance improvements for Intel Ice Lake processors and AWS Graviton2 processors.

More details here

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

noony avatar Feb 11 '22 17:02 noony

Hi, can I have input on this PR ? Is it something possible ? Should we build additional ami with kernel-5.10 in the name ? Thanks in advance for your response.

noony avatar Feb 16 '22 09:02 noony

I'm on board with this, 5.10 receives the same level of support from AL2 as 5.4. We just need to be confident that the change doesn't impact our users' workloads.

If you've deployed this change to a production environment, we'd love to hear about it.

cartermckinnon avatar Feb 18 '22 18:02 cartermckinnon

Hi @cartermckinnon , thanks for your response, I'm pretty confident that it will not break things, we already use this kernel on kops clusters without any issues (I agree with you, it's not the EKS ami)

noony avatar Feb 21 '22 10:02 noony

I got feedback from the wider team today: we'll have to make this change in tandem with a Kubernetes release. While 5.10 has largely stabilized, it isn't as battle-tested as 5.4; and we shouldn't ask users to undertake this sort of upgrade in the middle of their k8s version's support cycle. I'll update the title to reflect this.

@rtripat @prasad0896 Do you think this is realistic for 1.22?

cartermckinnon avatar Feb 24 '22 19:02 cartermckinnon

I modified the pull request to get the kernel 5.10 for kubernetes version >= 1.22 regarding your last comment. 👍

noony avatar Mar 04 '22 22:03 noony

With Dirty Pipe I assume this upgrade may be more pressing?

stefansedich avatar Mar 08 '22 20:03 stefansedich

With Dirty Pipe I assume this upgrade may be more pressing?

My .02: probably not exactly.

The sha of the fix for dirty pipe: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/lib/iov_iter.c?id=9d2231c5d74e13b2a0546fee6737ee4446017903

cinlin lists backported patches in: 5.16.11, 5.15.25, 5.10.102, 5.4.181, 4.19.231, 4.14.268, and 4.9.303

At the time of this writing the latest ami (1.21.5-20220303) is still on 5.4.176-91.338.amzn2. Its yet unclear to me whether this kernel is modified outside the upstream versioning -- it'd be great to get confirmation from this team whether these amzn2 kernel versions are expected to precisely match upstream.

That is to say, the shortest path to patching dirty pipe seems to be 5.4.181, which is a somewhat distinct issue from upgrading to 5.10 -- since they're talking about bundling with a 1.22 upgrade, etc.

[edit] I didn't see a different issue for this so I just filed https://github.com/awslabs/amazon-eks-ami/issues/882 to hopefully clarify the situation

mars64 avatar Mar 09 '22 00:03 mars64

@cartermckinnon since EKS 1.22 has been released and this didn't land, is there any idea when we can get this kernel upgrade landed?

We have specific workloads that require io_uring that aren't supported in 5.4 and have workarounds that are not the best at the moment to support it.

matdehaast avatar Apr 12 '22 14:04 matdehaast

I'm available if you need some modifications to do on my side.

noony avatar Apr 13 '22 16:04 noony

We're in a bind here, because:

  1. At present, 5.10 isn't/wasn't stable enough to use as the default for 1.22, and we doubt that will change in time for 1.23.
  2. We can't add a bootstrap flag for customers to opt-in to 5.10, because choosing a kernel at runtime isn't acceptable (it requires a reboot).
  3. We can't ship a AMI variant for things like this, because it blows out our matrix and would change our support contract by requiring a deprecation path.

My best guess is that 5.10 will be the default kernel for the 1.24 release. In the meantime, if your workload necessitates 5.10, you should use a custom AMI.

cartermckinnon avatar Apr 13 '22 19:04 cartermckinnon

For others that land here, my workaround for this issue, was to create an EKS node group based on the Bottlerocket OS. The officially supported image for EKS versions 1.20 and above is using kernel 5.10. It is a completely different architecture of a host system, so it might not be suitable for usecases that required deep modifications of the host environment. But if you are just looking to use features of 5.10 kernel in your workloads, like utilizing Wireguard, then Bottlerocket OS 1.5.3 (aws-k8s-1.20) can help.

domderen avatar Apr 20 '22 06:04 domderen

Just a concern: upgrade to 1.22 will be painful due to the several API deprecations. It's something our team is planning as a mid-term objective.

However, since the current kernel being supported by the current AMI has a High impact CVE we need to solve this in short-term. Meaning it would be largely beneficial to us (and I assume to many users) if this was backported to versions before 1.22 that are still supported, taking into account upgrading past it is not an effortless procedure.

Keeping it only on 1.23 would cause us to only be able to solve this high impact security issue once we migrate all of our clusters' workloads to take into account deprecations from 1.22 and then performing the actual upgrade.

LCaparelli avatar May 12 '22 14:05 LCaparelli

@LCaparelli we're not aware of any active kernel CVE's in our current AMI's; it's possible the one you're referring to has already been addressed. Please follow the instructions here if not.

cartermckinnon avatar Jun 06 '22 21:06 cartermckinnon

@cartermckinnon could we get a status update on this for EKS v1.24 (November 2022 I think)?

stevehipwell avatar Sep 13 '22 13:09 stevehipwell

@cartermckinnon @bwagner5 could we get a status update on this as the first EKS v1.24 AMI has been released and I don't see any mention of the v5.10 kernel? This is going to be a major issue if we can't use tools which require a modern kernel (e.g. eBPF).

stevehipwell avatar Nov 10 '22 09:11 stevehipwell

You're correct, apologies for not updating the status here. Timelines haven't aligned as planned with 1.24 GA and the 5.10 migration; but we're working on it, and I expect 1.24 AMI's to use 5.10 before the end of the year.

cartermckinnon avatar Nov 11 '22 03:11 cartermckinnon

@cartermckinnon does AL 2022 come into this discussion at all? Based on the docs it looks to be using kernel v5.15 by default. Is there a plan to move this to AL 2022 or to add an additional image based on it?

stevehipwell avatar Nov 11 '22 15:11 stevehipwell

We do plan to use an AL2022 base when possible; my current forecast is 1.25 will ship atop AL2022, assuming the GA dates are reasonably aligned.

cartermckinnon avatar Nov 11 '22 17:11 cartermckinnon

1.24 will move to 5.10 in #1118. I'm going to close this PR in favor of that one.

cartermckinnon avatar Dec 07 '22 21:12 cartermckinnon