Flatcar icon indicating copy to clipboard operation
Flatcar copied to clipboard

Flatcar instance at Linode loses network connectivity after 3227.2.0 upgrade

Open salfter opened this issue 2 years ago • 6 comments

Description

I have two Flatcar instances running, both at 3227.2.0. One is a bare-metal instance on a home server (an Asrock Rack X470D4U2-2T with a Ryzen 5 2600) that continues to run properly, but the other is a Linode VM (messages at boot time indicate they're using kvm). I tried accessing a service that should be running on it, but got nowhere. I tried to ssh in...no dice. I brought up the web console interface and got in that way, but saw the following:

Flatcar Container Linux by Kinvolk stable 3227.2.0 for QEMU
Failed Units: 2
  systemd-networkd.service
  systemd-networkd.socket

Rebooting made no difference. I tried restarting the indicated service, but that just produced an error.

The Linode console won't let me copy text out of it, so this is a screen dump of systemctl status systemd-networkd.service:

magnetico-error-1

and of journalctl -xeu systemd-networkd.service:

magnetico-error-2

Impact

I have a VM that is unreachable from across the network. How do I fix this?

Environment and steps to reproduce

No particular steps were taken on my part; this VM has been running without issue for the past few months, until the recent update to 3227.2.0.

Expected behavior

At a minimum, I'd at least be able to ssh in. Ideally, other services would be responsive. (I'd sometimes have to restart them manually after Flatcar updated itself, but not always.)

salfter avatar Jul 23 '22 22:07 salfter

I had something similar (not linode) once where dhcp failed to hand out a new lease.

Can you dig up more from the journal to see if you find something there?

Basically, search in sudo journalctl around the time where the units failed. Maybe boot log since it seems to happen then?

till avatar Jul 24 '22 20:07 till

could anyone here attach a full journalctl -b0 output?

jepio avatar Jul 25 '22 08:07 jepio

Would you be able to save some debug logs to disk, and then extract them after performing a manual rollback?https://www.flatcar.org/docs/latest/setup/debug/manual-rollbacks/#performing-a-manual-rollback

The first things that would help:

dmesg
journalctl -b0
networkctl status
ip link
ip addr
ls -la /usr/lib/systemd/systemd-networkd
sestatus

jepio avatar Jul 25 '22 08:07 jepio

And could you verify whether the issue also happens when a fresh VM is provisioned with 3227.2.0? Or does it require updating.

jepio avatar Jul 25 '22 08:07 jepio

And could you verify whether the issue also happens when a fresh VM is provisioned with 3227.2.0? Or does it require updating.

Not 100% sure I have the same issue, but I had problems with systemd-resolved not respecting dhcp settings and it was 100% on a new VM (no upgrades at all involved).

See https://kubernetes.slack.com/archives/C03GQ8B5XNJ/p1658762999205499

whites11 avatar Jul 26 '22 07:07 whites11

hi @salfter, any chance you could provide some more information, we would really love to track this down and fix this before the next bugfix release.

jepio avatar Aug 01 '22 07:08 jepio

This issue got away from me for a bit, but whatever was broken was fixed when I grabbed the most recent image this evening and ran through some of the steps in my Flatcar-on-Linode guide to get it up and running again:

https://alfter.us/2021/09/20/installing-flatcar-container-linux-on-linode/

Hopefully it was a one-time glitch, as this node had been through several updates previously without any problems. Hopefully it will be back to that behavior now.

salfter avatar Sep 01 '22 01:09 salfter