microk8s icon indicating copy to clipboard operation
microk8s copied to clipboard

Ubuntu 6.8.0-56 kernel breaks calico's ability to restore ip6tables rules.

Open zoc opened this issue 8 months ago • 11 comments

Summary

After upgrading Ubuntu kernel from 6.8.0-55 to 6.8.0-56, all exposed services are reachable using IPv4 but not IPv6 anymore on a dual stack cluster.

What Should Happen Instead?

Services should be reachable using IPv4 and IPv6.

Reproduction Steps

  1. Update ubuntu 24.04 kernel to latest (6.8.0-56)
  2. Reboot.

Introspection Report

N/A

Can you suggest a fix?

Are you interested in contributing with a fix?

zoc avatar Mar 24 '25 20:03 zoc

Additional comment: Rebooting on previous kernel fixes the issue. I've not kept the logs unfortunately (and don't want to go back to faulty kernel because this is on my production cluster), but issue is related to the mark id passed to ip6tables-legacy's --set-mark option being invalid.

I'm btw using metallb as the load balancer (not sure it makes a difference).

zoc avatar Mar 24 '25 20:03 zoc

Are you sure its a Calico issue? I got the same issue with Kube Proxy crashlooping with:

"Failed to execute iptables-restore" err=< exit status 2: Warning: Extension MARK revision 0 not supported, missing kernel module?                                                                                                                           ip6tables-restore v1.8.9 (nf_tables): unknown option "--xor-mark"

Downgrading to 6.8.0-55 fixed it for me too.

Im running RKE2 with Canal.

Larswa avatar Mar 25 '25 19:03 Larswa

Experienced here too. Routine kernel upgrade horked microk8s, with calico-node pod erroring with:

2025-03-26 01:41:32.859 [WARNING][39164] felix/table.go 1440: Failed to execute ip(6)tables-restore command error=exit status 2 errorOutput="Ignoring deprecated --wait-interval option.\nWarning: Extension MARK revision 0 not supported, missing kernel module?\nip6tables-restore v1.8.8 (legacy): MARK: bad value for option "--set-mark", or out of range (0-4294967295).\n\nError occurred at line: 41\nTry `ip6tables-restore -h' or 'ip6tables-restore --help' for more information.\n" input="[...truncated...]

Kernel did have the xt_mark module loaded.

Related discussion at https://github.com/tailscale/tailscale/issues/13863

Reverting to 6.8.0-55 fixed.

jtackaberry avatar Mar 26 '25 01:03 jtackaberry

Are you sure its a Calico issue?

Well, at least in the context of microk8s, then yes. But it also probably applies to many other software relying on ip6tables marks like tailscale.

zoc avatar Mar 26 '25 11:03 zoc

Commands for downgrade on Ubuntu 24.04

sudo apt remove linux-image-6.8.0-56-generic
yes
no
sudo update-grub
sudo reboot

Depicus avatar Mar 26 '25 13:03 Depicus

I filed a bug report for the component at Canonical. Its their package, the linux-image-6.8.0-56-generic
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2104282

Larswa avatar Mar 26 '25 16:03 Larswa

I also experienced other problems like MASQUERADE rules not working. There is something seriously wrong with that kernel…

steffann avatar Mar 27 '25 15:03 steffann

I tried linux-image-6.8.0-57-generic, the problem is not resolved in this version.

Mmx233 avatar Apr 01 '25 07:04 Mmx233

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2104282/comments/3

According to this the fix is in 6.8.0-60.

oznu avatar Apr 01 '25 07:04 oznu

Could this issue also be causing issues with the exec/logging function from the port 10250. I am running into

Error from server: Get "https://X.X.X.X:10250/containerLogs/gitea/gitea-runner-patch-statefulset-replicas-2mq9g/patch-statefulset": dial tcp X.X.X.X:10250: i/o timeout

and it seems to work fine whenever it is first loaded (I have reloaded a few times) and then ceases to work after a reboot)

Checked all of the normal stuff like connectivity etc.

aaronfs07 avatar Apr 05 '25 18:04 aaronfs07

Probably not. The breaking change in the kernel only affects IPv6, not IPv4.

zoc avatar Apr 05 '25 19:04 zoc