image-builder icon indicating copy to clipboard operation
image-builder copied to clipboard

Flatcar may inadvertently update during image creation

Open johananl opened this issue 2 years ago • 4 comments

In https://github.com/kubernetes-sigs/image-builder/pull/701/commits/c73b6e7a28b93dd03a747a8454600838e656ff5c we've disabled Flatcar updates. However, this change leaves a time gap which still allows Flatcar to download updates and reboot during image creation, resulting in either a broken build or an image with an unexpected version.

Following is a high-level description of the Flatcar image build process for ISO-based Packer builds (e.g. QEMU, OVA):

  1. We boot Flatcar from an official ISO or base image (e.g. AMI) of a specific Flatcar release.
  2. We execute flatcar-install while passing an Ignition file and reboot.
  3. Flatcar boots from disk.
  4. Ansible is executed.
  5. An image is created from the provisioned machine and the machine is terminated.

Currently, Flatcar may inadvertently update between stage 3 and 5 (inclusive). In order to prevent this, we need to disable updates in the Ignition config we pass to flatcar-install at stage 2.

NOTE: I've confirmed this bug exists for OVA builds, however it could apply also for AWS and Azure builds: Although these builds aren't based on booting from ISO, there is still a phase where the temporary VM is running while Ansible is executing. We should double-check updates are disabled during that phase, too.

TODO

  • [ ] Ensure updates are disabled throughout the entire build process as well as the final image for all supported providers.
    • [ ] AMI
    • [ ] Azure (SIG + VHD)
    • [x] OVA - handled by #895
    • [x] QEMU - handled by #895
    • [x] Raw - handled by #895

/kind bug /assign

johananl avatar Apr 08 '22 12:04 johananl

Fixed in https://github.com/flatcar-linux/flatcar-packer-qemu/pull/5. Once merged, we need to update the following:

https://github.com/kubernetes-sigs/image-builder/blob/2e85b15e9e75f5a194eb0c4c3f21c47918281a41/images/capi/packer/qemu/qemu-flatcar.json#L3

https://github.com/kubernetes-sigs/image-builder/blob/2e85b15e9e75f5a194eb0c4c3f21c47918281a41/images/capi/packer/raw/raw-flatcar.json#L3

johananl avatar Apr 08 '22 15:04 johananl

Following https://github.com/kubernetes-sigs/image-builder/pull/873#discussion_r866007976, looks like the solution to this issue depends on https://github.com/kubernetes-sigs/image-builder/issues/890.

johananl avatar May 12 '22 13:05 johananl

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 06 '22 12:10 k8s-triage-robot

/remove-lifecycle stale

johananl avatar Oct 07 '22 09:10 johananl

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 05 '23 09:01 k8s-triage-robot

/remove-lifecycle stale

johananl avatar Jan 05 '23 14:01 johananl

Confirmed that updates are enabled for AMIs:

core@ip-172-31-5-218 ~ $ systemctl status update-engine
● update-engine.service - Update Engine
     Loaded: loaded (/usr/lib/systemd/system/update-engine.service; disabled; vendor preset: disabled)
     Active: active (running) since Fri 2023-03-17 15:10:15 UTC; 3min 12s ago
   Main PID: 1351 (update_engine)
      Tasks: 2 (limit: 15114)
     Memory: 10.4M
        CPU: 80ms
     CGroup: /system.slice/update-engine.service
             └─1351 /usr/sbin/update_engine -foreground -logtostderr

core@ip-172-31-5-218 ~ $ systemctl status locksmithd
● locksmithd.service - Cluster reboot manager
     Loaded: loaded (/usr/lib/systemd/system/locksmithd.service; disabled; vendor preset: disabled)
     Active: active (running) since Fri 2023-03-17 15:10:15 UTC; 4min 13s ago
   Main PID: 1571 (locksmithd)
      Tasks: 6 (limit: 15114)
     Memory: 16.1M (limit: 32.0M)
        CPU: 16ms
     CGroup: /system.slice/locksmithd.service
             └─1571 /usr/lib/locksmith/locksmithd

We need to mask these units during image creation.

johananl avatar Mar 17 '23 15:03 johananl

Fix in https://github.com/kubernetes-sigs/image-builder/pull/1150.

johananl avatar May 03 '23 12:05 johananl