Flatcar
Flatcar copied to clipboard
GROUP is overwritten after upgrade and reboot on update.conf
Description
Custom GROUP
configured on /etc/flatcar/update.conf is reset to stable
(the channel of the group) after an upgrade.
Impact
When vms are updated, reconfiguration of the group on /etc/flatcar/update.conf is needed
Environment and steps to reproduce
- Set-up: VMs deployed on Openstack using cloud-init update.conf manually configured afterwards Nebraska instance deployed via helm chart
- Task: Updating node
- Action(s): Create group on Nebraska:
"name":"My-dev.stable",
"track":"My-dev.stable",
"description":"My dev stable Cluster vms",
"policy_updates_enabled":true,
"policy_safe_mode":true,
"policy_max_updates_per_period":999999,
"policy_period_interval":"1 hours",
"policy_update_timeout":"60 minutes",
"channel_id":"e06064ad-4414-4904-9a6e-fd465593d1b2",
"policy_timezone":"Europe/Berlin",
"application_id":"e96281a6-d1af-4bde-9a0a-97b76e56dc57"
Logging into node and update /etc/flatcar/update.conf
GROUP=My-dev.stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-1.local
Run:
- systemctl restart update-engine
- /usr/bin/update_engine_client -reset-status
- /usr/bin/update_engine_client -check_for_update
when
/usr/bin/update_engine_client -status
says we are ready for reboot, reboot the node
-
Error:
Logging back to the node, /etc/flatcar/update.conf has
GROUP
changed to stable andREBOOT_STRATEGY
is added as below
GROUP=stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-1.local
REBOOT_STRATEGY=off
Expected behavior
Logging back to the node, /etc/flatcar/update.conf should have been not modified
GROUP=My-dev.stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-1.local
Additional information
Followed https://kinvolk.io/docs/nebraska/latest/managing-updates/#existing-machines for setting up updates.conf file
Actually I could reproduced always via reseting the release as shown below
flatcar-test-2 ~ # cat /etc/flatcar/update.conf
GROUP=My-dev.stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-2.local
flatcar-test-2 ~ # systemctl restart update-engine
flatcar-test-2 ~ # update_engine_client -reset_status
I0610 11:16:50.386335 17409 update_engine_client.cc:223] Setting Update Engine status to idle ...
I0610 11:16:50.388559 17409 update_engine_client.cc:229] ResetStatus succeeded; to undo partition table changes run:
(D=$(rootdev -d) P=$(rootdev -s); cgpt p -i$(($(echo ${P#$D} | sed 's/^[^0-9]*//')-1)) $D;)
flatcar-test-2 ~ # update_engine_client -update
I0610 11:16:53.010391 17454 update_engine_client.cc:247] Initiating update check and install.
I0610 11:16:53.015081 17454 update_engine_client.cc:252] Waiting for update to complete.
LAST_CHECKED_TIME=1623323813
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_UPDATE_AVAILABLE
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.030048
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.090120
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.160172
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.220256
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.320365
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.440405
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.510476
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.640641
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.810836
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.910956
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_FINALIZING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_UPDATED_NEED_REBOOT
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
I0610 11:17:58.329700 17454 update_engine_client.cc:194] Update succeeded -- reboot needed.
flatcar-test-2 ~ # reboot
Connection to flatcar-test-2.local closed by remote host.
Connection to flatcar-test-2.local closed.
ssh to flatcar-test-2.local again
[...]
Password authentication is disabled to avoid man-in-the-middle attacks.
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
Last login: Thu Jun 10 11:10:06 UTC 2021 from 10.123.44.53 on pts/0
Flatcar Container Linux by Kinvolk stable (2765.2.5)
Update Strategy: No Reboots
core@flatcar-test-2 ~ $ cat /etc/flatcar/update.conf
GROUP=stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-2.local
REBOOT_STRATEGY=off
Hi, can you share your cloud-config userdata?
I understood that REBOOT_STRATEGY
stays part of the cloud-config userdata. It is expected for it to be written to the file again because the cloud-config data gets processed on every boot.
Hi,
I did have nothing on user_data related to this (my goal is actually to provision the update config afterwards), and I hardly think it comes from there as this only happens after an update. If I just reboot the vm, there is no issue, the update.conf file is ok
core@flatcar-test-2 ~ $ cat /etc/flatcar/update.conf
GROUP=My-dev.stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-2.local
core@flatcar-test-2 ~ $ cat /usr/share/flatcar/release
FLATCAR_RELEASE_VERSION=2765.2.3
FLATCAR_RELEASE_BOARD=amd64-usr
FLATCAR_RELEASE_APPID={e96281a6-d1af-4bde-9a0a-97b76e56dc57}
core@flatcar-test-2 ~ $ sudo reboot
Connection to flatcar-test-2.local closed by remote host.
Connection to flatcar-test-2.local closed.
✘ ~ ssh flatcar-test-2.local
Last login: Thu Jun 17 06:43:03 UTC 2021 from 10.123.44.53 on pts/0
Flatcar Container Linux by Kinvolk flatcar.lttwdev (2765.2.3)
core@flatcar-test-2 ~ $ cat /etc/flatcar/update.conf
GROUP=My-dev.stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-2.local
core@flatcar-test-2 ~ $ cat /usr/share/flatcar/release
FLATCAR_RELEASE_VERSION=2765.2.3
FLATCAR_RELEASE_BOARD=amd64-usr
FLATCAR_RELEASE_APPID={e96281a6-d1af-4bde-9a0a-97b76e56dc57}
core@flatcar-test-2 ~ $ logout
user-data:
#cloud-config
write_files:
- path: /etc/systemd/network/80-app.network
owner: "root:root"
permissions: "0644"
content: |
[Network]
DHCP=yes
[DHCP]
UseMTU=true
UseDomains=false
[Match]
Name=eth0
- path: /etc/systemd/network/90-storage.network
owner: "root:root"
permissions: "0644"
content: |
[Network]
DHCP=yes
[DHCP]
UseMTU=true
UseDomains=false
UseRoutes=false
[Match]
Name=eth*
- path: /etc/systemd/network/zz-default.network
owner: "root:root"
permissions: "0644"
content: |
[Network]
DHCP=yes
[DHCP]
UseMTU=true
UseDomains=false
[Match]
Name=*
Thanks for taking a look
Hi, is this still happening with a recent release?
I suggest modifying oem-cloudinit.service
from ExecStart=/usr/bin/coreos-cloudinit
to ExecStart=/usr/bin/strace -f /usr/bin/coreos-cloudinit
and share the unit log.
I still think there is some logical bug around
https://github.com/flatcar-linux/coreos-cloudinit/blob/cfcc44197d11f44441e5aa2c9db34bcd0bf16015/system/update.go#L58
but if coreos-cloudinit is not writing the file we can search elsewhere.