Flatcar icon indicating copy to clipboard operation
Flatcar copied to clipboard

unable to update flatcar

Open till opened this issue 1 year ago • 26 comments

Description

Trying to update to the latest release fails.

Impact

Cannot update to the latest stable.

Environment and steps to reproduce

Current version:

NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=2905.2.3
VERSION_ID=2905.2.3
BUILD_ID=2021-08-31-1644
PRETTY_NAME="Flatcar Container Linux by Kinvolk 2905.2.3 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar-linux.org/"
BUG_REPORT_URL="https://issues.flatcar-linux.org"
FLATCAR_BOARD="amd64-usr"

Ran update_engine_client -update (after -check_for_update failed):

Journal:

-- Journal begins at Wed 2024-03-27 11:46:49 UTC, ends at Thu 2024-04-11 16:50:31 UTC. --
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275560  1015 prefs.cc:51] certificate-report-to-send-update not present in /var/lib/update_engine/prefs
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275609  1015 prefs.cc:51] certificate-report-to-send-download not present in /var/lib/update_engine/prefs
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275614  1015 update_attempter.cc:131] Not updating b/c we already updated and we're waiting for reboot, we'll ping Oma>
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275619  1015 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275636  1015 omaha_request_action.cc:268] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/updat>
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275640  1015 omaha_request_action.cc:269] Request: <?xml version="1.0" encoding="UTF-8"?>
Mar 27 11:50:18 node-006.cluster update_engine[1015]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="scheduler" ismachine="1">
Mar 27 11:50:18 node-006.cluster update_engine[1015]:     <os version="Chateau" platform="CoreOS" sp="2765.2.2_x86_64"></os>
Mar 27 11:50:18 node-006.cluster update_engine[1015]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="2765.2.2" track="stable" bootid="{e41a785f-b9e3-4607-92ed-94afaf485d>
Mar 27 11:50:18 node-006.cluster update_engine[1015]:         <ping active="1"></ping>
Mar 27 11:50:18 node-006.cluster update_engine[1015]:     </app>
Mar 27 11:50:18 node-006.cluster update_engine[1015]: </request>
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275645  1015 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275781  1015 libcurl_http_fetcher.cc:167] Setting up curl options for HTTPS
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275931  1015 libcurl_http_fetcher.cc:428] io_counter = 89950
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275943  1015 libcurl_http_fetcher.cc:435] Setting up timeout source: 1 seconds.
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.275954  1015 update_check_scheduler.cc:74] Next update check in 41m12s
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.417197  1015 libcurl_http_fetcher.cc:248] HTTP response code: 200
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.421885  1015 libcurl_http_fetcher.cc:305] Transfer completed (200), 229 bytes downloaded
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.421998  1015 omaha_request_action.cc:619] Omaha request response: <?xml version="1.0" encoding="UTF-8"?>
Mar 27 11:50:18 node-006.cluster update_engine[1015]: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57>
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.422195  1015 action_processor.cc:65] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Mar 27 11:50:18 node-006.cluster update_engine[1015]: I0327 11:50:18.422212  1015 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.280997  1015 prefs.cc:51] certificate-report-to-send-update not present in /var/lib/update_engine/prefs
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.281064  1015 prefs.cc:51] certificate-report-to-send-download not present in /var/lib/update_engine/prefs
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.281070  1015 update_attempter.cc:131] Not updating b/c we already updated and we're waiting for reboot, we'll ping Oma>
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.281077  1015 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.281103  1015 omaha_request_action.cc:268] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/updat>
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.281108  1015 omaha_request_action.cc:269] Request: <?xml version="1.0" encoding="UTF-8"?>
Mar 27 12:31:30 node-006.cluster update_engine[1015]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="scheduler" ismachine="1">
Mar 27 12:31:30 node-006.cluster update_engine[1015]:     <os version="Chateau" platform="CoreOS" sp="2765.2.2_x86_64"></os>
Mar 27 12:31:30 node-006.cluster update_engine[1015]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="2765.2.2" track="stable" bootid="{e41a785f-b9e3-4607-92ed-94afaf485d>
Mar 27 12:31:30 node-006.cluster update_engine[1015]:         <ping active="1"></ping>
Mar 27 12:31:30 node-006.cluster update_engine[1015]:     </app>
Mar 27 12:31:30 node-006.cluster update_engine[1015]: </request>
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.281116  1015 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.281270  1015 libcurl_http_fetcher.cc:167] Setting up curl options for HTTPS
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.281430  1015 libcurl_http_fetcher.cc:435] Setting up timeout source: 1 seconds.
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.281443  1015 update_check_scheduler.cc:74] Next update check in 48m57s
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.423014  1015 libcurl_http_fetcher.cc:248] HTTP response code: 200
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.425431  1015 libcurl_http_fetcher.cc:305] Transfer completed (200), 229 bytes downloaded
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.425487  1015 omaha_request_action.cc:619] Omaha request response: <?xml version="1.0" encoding="UTF-8"?>
Mar 27 12:31:30 node-006.cluster update_engine[1015]: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57>
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.425529  1015 action_processor.cc:65] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Mar 27 12:31:30 node-006.cluster update_engine[1015]: I0327 12:31:30.425531  1015 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Mar 27 13:20:27 node-006.cluster update_engine[1015]: I0327 13:20:27.262565  1015 prefs.cc:51] certificate-report-to-send-update not present in /var/lib/update_engine/prefs

Update engine log:

core@node-006 ~ $ update_engine_client -update
I0411 16:53:37.114137  3438 update_engine_client.cc:247] Initiating update check and install.
I0411 16:53:37.116988  3438 update_engine_client.cc:252] Waiting for update to complete.
LAST_CHECKED_TIME=1712854417
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_UPDATE_AVAILABLE
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.160216
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.210273
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.340420
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.480578
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.530634
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.570679
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.710837
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.760894
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.881029
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.961120
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_FINALIZING
NEW_VERSION=3815.2.1
NEW_SIZE=458234665
LAST_CHECKED_TIME=1712854417
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_IDLE
NEW_VERSION=0.0.0
NEW_SIZE=0
E0411 16:54:41.938392  3438 update_engine_client.cc:190] Update failed.

Expected behavior

I can reboot into the new version.

Additional information

Please add any information here that does not fit the above format.

till avatar Apr 11 '24 16:04 till

Moarrr logs:

Apr 11 16:56:48 node-006.cluster update_engine[925]: I0411 16:56:48.716156   925 omaha_request_action.cc:268] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/update/
Apr 11 16:56:48 node-006.cluster update_engine[925]: I0411 16:56:48.716158   925 omaha_request_action.cc:269] Request: <?xml version="1.0" encoding="UTF-8"?>
Apr 11 16:56:48 node-006.cluster update_engine[925]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="scheduler" ismachine="1">
Apr 11 16:56:48 node-006.cluster update_engine[925]:     <os version="Chateau" platform="CoreOS" sp="2905.2.3_x86_64"></os>
Apr 11 16:56:48 node-006.cluster update_engine[925]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="2905.2.3" track="stable" bootid="{77feca82-1f05-46ac-abd3-3bdb2d8ce497}" oem="openstack" oemversion="0.1.2" alephversion="2765.2.2" machineid="68cf162f8fc8467ca7d8337464c863d1" machinealias="" lang="en-US" board="amd64-usr" hardware_class="" delta_okay="false" >
Apr 11 16:56:48 node-006.cluster update_engine[925]:         <event eventtype="14" eventresult="1"></event>
Apr 11 16:56:48 node-006.cluster update_engine[925]:     </app>
Apr 11 16:56:48 node-006.cluster update_engine[925]: </request>
Apr 11 16:56:48 node-006.cluster update_engine[925]: I0411 16:56:48.716162   925 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
Apr 11 16:56:48 node-006.cluster update_engine[925]: I0411 16:56:48.716220   925 libcurl_http_fetcher.cc:167] Setting up curl options for HTTPS
Apr 11 16:56:48 node-006.cluster update_engine[925]: I0411 16:56:48.716382   925 libcurl_http_fetcher.cc:435] Setting up timeout source: 1 seconds.
Apr 11 16:56:48 node-006.cluster update_engine[925]: I0411 16:56:48.845341   925 libcurl_http_fetcher.cc:248] HTTP response code: 200
Apr 11 16:56:48 node-006.cluster update_engine[925]: I0411 16:56:48.846369   925 libcurl_http_fetcher.cc:305] Transfer completed (200), 231 bytes downloaded
Apr 11 16:56:48 node-006.cluster update_engine[925]: I0411 16:56:48.846383   925 omaha_request_action.cc:619] Omaha request response: <?xml version="1.0" encoding="UTF-8"?>
Apr 11 16:56:48 node-006.cluster update_engine[925]: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" status="ok"><event status="ok"></event></app></response>
Apr 11 16:56:48 node-006.cluster update_engine[925]: I0411 16:56:48.846388   925 action_processor.cc:82] ActionProcessor::ActionComplete: finished OmahaRequestAction, starting FilesystemCopierAction
Apr 11 16:56:48 node-006.cluster update_engine[925]: I0411 16:56:48.846414   925 filesystem_copier_action.cc:294] Filesystem size: 1073741824
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.194658   925 filesystem_copier_action.cc:274] Hash: E7eDMvtvGQT+lVgMJgI8dIU85NuGeyAp9T9D3LNL/Rk=
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.194696   925 action_processor.cc:82] ActionProcessor::ActionComplete: finished FilesystemCopierAction, starting KernelVerifierAction
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.239066   925 kernel_verifier_action.cc:74] Kernel size: 54815968
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.239118   925 kernel_verifier_action.cc:75] Kernel hash: DtBksHl92+MBdqLz9qk5+Z75LsMdRt+ttzZIdIQTGjY=
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.239127   925 action_processor.cc:82] ActionProcessor::ActionComplete: finished KernelVerifierAction, starting PCRPolicyPostAction
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.239135   925 action_processor.cc:82] ActionProcessor::ActionComplete: finished PCRPolicyPostAction, starting PostinstallRunnerAction
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.296525   925 postinstall_runner_action.cc:39] Failed to mount install part as ext2. Trying ext3.
Apr 11 16:56:50 node-006.cluster update_engine[925]: E0411 16:56:50.298048   925 postinstall_runner_action.cc:47] Unable to mount destination device /dev/sda3 onto /tmp/au_postint_mount.ZkTx0a
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299031   925 action_processor.cc:68] ActionProcessor::ActionComplete: PostinstallRunnerAction action failed. Aborting processing.
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299048   925 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type PostinstallRunnerAction
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299053   925 update_attempter.cc:302] Processing Done.
Apr 11 16:56:50 node-006.cluster update_engine[925]: E0411 16:56:50.299101   925 update_attempter.cc:615] Update failed.
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299106   925 payload_state.cc:97] Updating payload state for error code: 5 (kActionCodePostinstallRunnerError)
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299110   925 payload_state.cc:178] Not incrementing URL index or failure count for this error
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299192   925 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299211   925 omaha_request_action.cc:268] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/update/
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299214   925 omaha_request_action.cc:269] Request: <?xml version="1.0" encoding="UTF-8"?>
Apr 11 16:56:50 node-006.cluster update_engine[925]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="scheduler" ismachine="1">
Apr 11 16:56:50 node-006.cluster update_engine[925]:     <os version="Chateau" platform="CoreOS" sp="2905.2.3_x86_64"></os>
Apr 11 16:56:50 node-006.cluster update_engine[925]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="2905.2.3" track="stable" bootid="{77feca82-1f05-46ac-abd3-3bdb2d8ce497}" oem="openstack" oemversion="0.1.2" alephversion="2765.2.2" machineid="68cf162f8fc8467ca7d8337464c863d1" machinealias="" lang="en-US" board="amd64-usr" hardware_class="" delta_okay="false" >
Apr 11 16:56:50 node-006.cluster update_engine[925]:         <event eventtype="3" eventresult="0" errorcode="5"></event>
Apr 11 16:56:50 node-006.cluster update_engine[925]:     </app>
Apr 11 16:56:50 node-006.cluster update_engine[925]: </request>
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299217   925 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299273   925 libcurl_http_fetcher.cc:167] Setting up curl options for HTTPS
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.299377   925 libcurl_http_fetcher.cc:435] Setting up timeout source: 1 seconds.
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.401546   925 libcurl_http_fetcher.cc:248] HTTP response code: 200
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.402782   925 libcurl_http_fetcher.cc:305] Transfer completed (200), 231 bytes downloaded
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.402801   925 omaha_request_action.cc:619] Omaha request response: <?xml version="1.0" encoding="UTF-8"?>
Apr 11 16:56:50 node-006.cluster update_engine[925]: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" status="ok"><event status="ok"></event></app></response>
Apr 11 16:56:50 node-006.cluster update_engine[925]: E0411 16:56:50.402817   925 omaha_request_action.cc:626] HTTP reported success but Omaha reports an error.
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.402822   925 action_processor.cc:65] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.402825   925 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.402828   925 update_attempter.cc:302] Processing Done.
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.402834   925 update_attempter.cc:306] Error event sent.
Apr 11 16:56:50 node-006.cluster update_engine[925]: I0411 16:56:50.402843   925 update_check_scheduler.cc:74] Next update check in 46m28s

Can't make sense of the error though.

till avatar Apr 11 '24 17:04 till

Tried an airgabbed update to work around the weird nebraska issue, so I downloaded flatcar-update but that didn't work either:

node-006 core # wget "https://update.release.flatcar-linux.net/${ARCH}-usr/${VER}/flatcar_production_update.gz"
--2024-04-11 17:59:31--  https://update.release.flatcar-linux.net/amd64-usr/3815.2.1/flatcar_production_update.gz
Resolving update.release.flatcar-linux.net... 147.75.87.17
Connecting to update.release.flatcar-linux.net|147.75.87.17|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 458234665 (437M) [application/gzip]
Saving to: 'flatcar_production_update.gz'

flatcar_production_update.gz                     100%[========================================================================================================>] 437.01M   137MB/s    in 3.3s    

2024-04-11 17:59:34 (132 MB/s) - 'flatcar_production_update.gz' saved [458234665/458234665]

node-006 core # ./flatcar-update --to-version "$VER" --to-payload flatcar_production_update.gz
./flatcar-update: line 209: EXTENSIONS[@]: unbound variable

till avatar Apr 11 '24 18:04 till

The problem is that the update-engine client can't mount the btrfs partition. You can first update to LTS (GROUP=lts in /etc/flatcar/update.conf) or explicitly with flatcar-update to 3510.3.2. We have an idea how to do this automatically with Nebraska but didn't do it so far: https://github.com/flatcar/Flatcar/issues/1185

The bash error is a bug in bash 4.3, I prepared a workaround in https://github.com/flatcar/init/pull/119 which you can use to update to LTS explicitly instead of setting GROUP=lts. Note that once you run LTS it defaults to lts if you don't have a value set in /etc/flatcar/update.conf. So if you want Stable, set GROUP=stable before or after running flatcar-update.

pothos avatar Apr 12 '24 05:04 pothos

Worked! :)

till avatar Apr 12 '24 07:04 till

@pothos I am trying this on another machine that's stuck on an older release and I am hitting the bash error again:

core@node-001 ~ $ sudo ./flatcar-update -V 3510.3.2
Downloading update payloads...
./flatcar-update: line 211: EXTENSIONS[*]: unbound variable

till avatar Jun 03 '24 16:06 till

When I comment out this line, it's progressing but dies at:

core@node-001 ~ $ sudo ./flatcar-update --to-version 3510.3.2
Downloading update payloads...
Forcing update...
./flatcar-update: line 336: EXTENSIONS[*]: unbound variable
./flatcar-update: line 1: 295388 Terminated              ncat --keep-open -c "echo -en 'HTTP/1.1 200 OK\ncontent-type: text/xml\ncontent-length: $(stat --printf='%s\n' /tmp/response)\n\n'; cat /tmp/response" -l "$LISTEN_PORT_1"

Script is full of these problems.

till avatar Jun 03 '24 16:06 till

Update still worked, though had to remove a line from /etc/flatcar/update.conf (for the local server).

till avatar Jun 03 '24 16:06 till

The problem is that the update-engine client can't mount the btrfs partition. You can first update to LTS (GROUP=lts in /etc/flatcar/update.conf) or explicitly with flatcar-update to 3510.3.2. We have an idea how to do this automatically with Nebraska but didn't do it so far: #1185

The bash error is a bug in bash 4.3, I prepared a workaround in flatcar/init#119 which you can use to update to LTS explicitly instead of setting GROUP=lts. Note that once you run LTS it defaults to lts if you don't have a value set in /etc/flatcar/update.conf. So if you want Stable, set GROUP=stable before or after running flatcar-update.

Worked for me too, updating from an older version of flatcar (changed GROUP=stable to GROUP=lts, updated and rebooted, and then switched back to stable channel and updated again). Thanks

whysi avatar Jun 04 '24 15:06 whysi

@whysi what did you run? update_engine_client -update? For some reason, I had to use flatcar-update script.

till avatar Jun 04 '24 17:06 till

@till exactly, I ran update_engine_client -update . Before that I tried multiple time flatcar-update script in an "air-gapped" environment, with no success. In the update-engine logs in journalctl I had pretty much the same errors that you had (googling the exact error got me here).

Also I wanted to mention that running the "air-gapped" mode of the script tries to download the get the update-payload-key from the web...a bit of a nonsense. Anyway even copying the key manually from a non-air-gapped machine didn't solve my errors with the script that I tried before finding this issue and the resolution suggested by @pothos

whysi avatar Jun 06 '24 13:06 whysi

@dongsupark can you re-schedule this? The script still contains more of these bash errors.

till avatar Jun 12 '24 09:06 till

@till Are you still facing this issue?

sayanchowdhury avatar Jun 06 '25 13:06 sayanchowdhury

@sayanchowdhury I haven't tested, not sure what/where the bash errors were from. I think I was running this on a very old system, possibly incompatible version of bash?

till avatar Jun 06 '25 13:06 till

Currently facing this with:

NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3510.2.2
VERSION_ID=3510.2.2
BUILD_ID=2023-05-29-0939
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3510.2.2 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3510.2.2:*:*:*:*:*:*:*"

Had to sudo flatcar-update --to-version 4230.2.3

till avatar Oct 14 '25 15:10 till

Oct 14 15:28:26 node-001.luzilla update_engine[7130]: I20251014 15:28:26.474259  7130 libcurl_http_fetcher.cc:320] Transfer completed (200), 305 bytes downloaded
Oct 14 15:28:26 node-001.luzilla update_engine[7130]: I20251014 15:28:26.474316  7130 omaha_request_action.cc:617] Omaha request response: <?xml version="1.0" encoding="UTF-8"?>
Oct 14 15:28:26 node-001.luzilla update_engine[7130]: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" status="error-failedToRetrieveUpdatePackageInfo"><updatecheck status="error-internal"><urls></urls></updatecheck></app></response>
Oct 14 15:28:26 node-001.luzilla update_engine[7130]: E20251014 15:28:26.474337  7130 omaha_request_action.cc:629] HTTP reported success but Omaha reports an error.
Oct 14 15:28:26 node-001.luzilla update_engine[7130]: I20251014 15:28:26.474352  7130 action_processor.cc:65] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Oct 14 15:28:26 node-001.luzilla update_engine[7130]: I20251014 15:28:26.474365  7130 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Oct 14 15:28:26 node-001.luzilla update_engine[7130]: I20251014 15:28:26.474377  7130 update_attempter.cc:306] Processing Done.
Oct 14 15:28:26 node-001.luzilla update_engine[7130]: I20251014 15:28:26.474391  7130 update_attempter.cc:310] Error event sent.

Update engine is a bit odd, I must say. In regards to how it never surfaces errors, but says "no update available".

@t-lo @tormath1 — Can this get some attention?

till avatar Oct 14 '25 15:10 till

Hello @till! Do you know if your node was able to make multiple update attempts in this case before your intervention with sudo flatcar-update --to-version 4230.2.3? The original issue description shows a PostinstallRunnerAction action failure which is a local problem, however your latest log snippet indicates something else: the error is server-side or related to the request payload, not a local problem like filesystem mounting.

ervcz avatar Oct 15 '25 12:10 ervcz

@ervcz I've tried update_engine_client -update multiple times before I went with flatcar-update. We recently switched to a local Nebraska to better coordinate updates. Is the protocol incompatible?

I see these in the logs:

2025-10-14T15:18:55Z WRN Handle - malformed omaha request error XML syntax error on line 4: attribute name without = in element context=omaha
2025-10-14T15:18:55Z ERR process omaha request error="omaha: request is malformed: XML syntax error on line 4: attribute name without = in element" context=nebraska
2025-10-14T15:18:56Z WRN Handle - malformed omaha request error XML syntax error on line 4: attribute name without = in element context=omaha
2025-10-14T15:18:56Z ERR process omaha request error="omaha: request is malformed: XML syntax error on line 4: attribute name without = in element" context=nebraska
2025-10-14T15:19:42Z WRN Handle - malformed omaha request error XML syntax error on line 4: attribute name without = in element context=omaha
2025-10-14T15:19:42Z ERR process omaha request error="omaha: request is malformed: XML syntax error on line 4: attribute name without = in element" context=nebraska
2025-10-14T15:19:43Z WRN Handle - malformed omaha request error XML syntax error on line 4: attribute name without = in element context=omaha
2025-10-14T15:19:43Z ERR process omaha request error="omaha: request is malformed: XML syntax error on line 4: attribute name without = in element" context=nebraska
2025-10-14T15:21:08Z WRN Handle - malformed omaha request error XML syntax error on line 4: attribute name without = in element context=omaha
2025-10-14T15:21:08Z ERR process omaha request error="omaha: request is malformed: XML syntax error on line 4: attribute name without = in element" context=nebraska
2025-10-14T15:21:09Z WRN Handle - malformed omaha request error XML syntax error on line 4: attribute name without = in element context=omaha
2025-10-14T15:21:09Z ERR process omaha request error="omaha: request is malformed: XML syntax error on line 4: attribute name without = in element" context=nebraska
2025-10-14T15:22:11Z WRN Handle - malformed omaha request error XML syntax error on line 4: attribute name without = in element context=omaha
2025-10-14T15:22:11Z ERR process omaha request error="omaha: request is malformed: XML syntax error on line 4: attribute name without = in element" context=nebraska
2025-10-14T15:22:12Z WRN Handle - malformed omaha request error XML syntax error on line 4: attribute name without = in element context=omaha
2025-10-14T15:22:12Z ERR process omaha request error="omaha: request is malformed: XML syntax error on line 4: attribute name without = in element" context=nebraska

till avatar Oct 15 '25 12:10 till

Thanks for the additional information. No, the protocol had not been changed for a while.

I would have some follow-ups to better understand your situation.

I've tried update_engine_client -update multiple times before I went with flatcar-update

So your node/nodes get their updates if you check for updates manually? No scheduled update checks I assume. If you have other nodes too, did you experience the same on those as well? What Nebraska version are you using?

I see the two log snippets are almost 8 min apart, but I assume they correlate. Due to the fact that flatcar-update works and update_engine_client -update doesn't, I suspect they construct the requests differently. I will try to look into that.

ervcz avatar Oct 15 '25 13:10 ervcz

We are in the process of transitioning away from cgroupv1, so I found this in the log - would be nice if this is elevated:

Flatcar version 4187 or higher does not support cgroupv1 anymore. Aborting the update...

till avatar Oct 15 '25 14:10 till

This is another server where I am updating right now. The fix was to reboot it (since I disabled cgroupv1 already) and now the update continues.

I think it's a mix of different things of how the update can fail. It all seems to boil down to having to skim through the logs to find the right "message".

till avatar Oct 15 '25 14:10 till

We are in the process of transitioning away from cgroupv1, so I found this in the log - would be nice if this is elevated:

Flatcar version 4187 or higher does not support cgroupv1 anymore. Aborting the update...

To be fair, we've been shouting about this quite loudly for ages. It's been mentioned in practically every public meeting, and there's a big warning banner on the releases page. It's probably mentioned in other places too. It would be nice if it was more visible on the system itself, but there's not much we can do about existing deployments.

chewi avatar Oct 15 '25 15:10 chewi

The transition itself is not a biggie. We've also been tracking it internally for a long time. My issue is that none of the problems are elevated. The update client is not very verbose and the log is noisy.

till avatar Oct 15 '25 15:10 till

@till That's a good point. The post-install script which aborts the upgrade could be (much) more verbose on this, and it's not even a big change.

t-lo avatar Oct 15 '25 16:10 t-lo

Maybe all of this is feedback for the update engine rebuild? :)


I would have some follow-ups to better understand your situation.

I've tried update_engine_client -update multiple times before I went with flatcar-update

So your node/nodes get their updates if you check for updates manually? No scheduled update checks I assume. If you have other nodes too, did you experience the same on those as well? What Nebraska version are you using?

We just added Nebraska last week and are currently going through a bunch of upgrade steps to transition away from cgroupsv1 etc., deploy the update.conf, etc.. I can see that all nodes that we upgraded also register themselves in Nebraska, so that part seems to work. The Nebraska version itself is 2.12.0.

I see the two log snippets are almost 8 min apart, but I assume they correlate. Due to the fact that flatcar-update works and update_engine_client -update doesn't, I suspect they construct the requests differently. I will try to look into that.

I had forgotten this, but I think flatcar-update wraps around update_engine_client. But otherwise, I don't know if they do anything different. I think, they don't.

LMK, if I answered your questions. Or if you want me to run more tests, I have a couple more of these nodes that I need to upgrade.

till avatar Oct 15 '25 19:10 till

Maybe all of this is feedback for the update engine rebuild? :)

Definitely! There's a lower hanging fruit though of shouting (much) louder in the postinst script: https://github.com/flatcar/update_engine/blob/main/flatcar-postinst#L258

That script is run from the new OS image so we can ship it with an upcoming release, and it will yell much louder in the logs.

t-lo avatar Oct 16 '25 07:10 t-lo

In the past we also did issue announcements that appear when login in on a node via SSH. Depending on how that is implemented it could even work to inject something from the postinst script for failures? Ideally update-engine/Nebraska would allow to add additional error texts to the update error status.

pothos avatar Oct 20 '25 02:10 pothos