kairos icon indicating copy to clipboard operation
kairos copied to clipboard

spike: Investigate Firmware Patching with System Extensions in Kairos

Open ci-robbot opened this issue 6 months ago • 2 comments

From discussion in: https://spectrocloud.slack.com/archives/C056VJ4V1HP/p1747053020308779

Problem Statement

In Kairos, firmware and driver updates post-installation are challenging due to the immutable OS nature. Traditional methods like booting an ISO or mounting an ISO to run scripts are not efficient. RPMs required for updates cannot be directly installed due to immutability.

Proposed Solution

Using system-extensions (sysext) could allow for modular updates. However, current limitations in Kairos 3.1.3 and earlier versions restrict this capability. The goal is to investigate how to effectively use sys-extensions for firmware patches and driver updates, ensuring compatibility with future versions like 3.5.x.

Use Cases

  • Firmware Updates: Apply firmware updates post-installation without rebooting the system.
  • Driver Updates: Install necessary drivers for specific hardware configurations after deployment.
  • Modular Updates: Enable incremental updates to the OS without requiring a full image rebuild.

Acceptance Criteria

  • Investigation: Determine the feasibility of using sys-extensions for firmware and driver updates in Kairos.
  • End-to-End Example: Provide a working example of how to create and apply a sys-extension for firmware patches.
  • Documentation: Document the process and best practices for using sys-extensions in similar scenarios.
  • Compatibility: Ensure the solution works with Kairos 3.4.x and later versions.

Additional Considerations

  • Hardware Detection: The solution should account for hardware-specific firmware and driver needs, possibly requiring build-time hardware detection.
  • Overlaying /lib: Investigate the feasibility of overlaying the /lib directory to accommodate kernel modules and other critical components.
  • Systemd Configuration: Adjust SYSTEMD_SYSEXT_HIERARCHIES to include necessary directories for sys-extensions.

Steps to Implement

  1. Research: Explore existing documentation and community discussions on sys-extensions in Kairos.
  2. Prototype: Develop a prototype sys-extension for firmware updates, testing it on different hardware configurations.
  3. Testing: Validate the prototype's effectiveness in real-world scenarios, ensuring it meets the acceptance criteria.
  4. Documentation: Create detailed documentation for users and developers to follow.
  5. Feedback Loop: Engage with the Kairos community and maintainers to refine the solution based on feedback.

Conclusion

This issue aims to address the challenges of firmware and driver updates in Kairos by leveraging system extensions. By investigating and implementing a viable solution, we can enhance the flexibility and maintainability of Kairos in diverse deployment scenarios.

ci-robbot avatar May 14 '25 14:05 ci-robbot

e.g. of overriding mount paths:

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=systemd-sysext merge
ExecStop=systemd-sysext unmerge
Environment="SYSTEMD_SYSEXT_HIERARCHIES=/usr/:/opt/:/lib/"

mudler avatar May 14 '25 15:05 mudler

Idea from planning: We can try to have a symlink from /usr to /lib (and maybe create the dir at build time so the link is not broken). If this works, we don't need to extend SYSTEMD_SYSEXT_HIERARCHIES. Otherwise we need to rely on that variable which is not officially supported nor documented (we just found it in code).

The other solution is to keep the firmware in the image (by installing those in the dockerfile) but that makes the image big with all the limitations is creates in uki mode.

jimmykarily avatar May 19 '25 08:05 jimmykarily

bot misled in a couple of things, like the Modular Updates: Enable incremental updates to the OS without requiring a full image rebuild. thats not part of this card, this is about extending the system during runtime with sysextensions and making sure it works with other aspects of the system, not only the regular binaries. So for example kernel modules (which is a bad example as they all need to be signed to be loaded into the kernel) or firmware

I think I migth be able to test this with a firmware trhing. I got a network usb card around that requires firmware to be installed. Its provided by the linux-extra-firmware or something like that so my plan is to:

  • build the image without the firmware packages
  • generate a sysext with the firmware packages (using the built image as base image, and the aurora helper (https://kairos.io/docs/advanced/sys-extensions/#building-system-extensions-from-a-docker-image-with-auroraboot)
  • try to load them during runtime in a deployed system
  • have the usb network card connected before and after to see if it indeeds loads the needed firmware

Itxaka avatar Sep 02 '25 11:09 Itxaka

umm no I dont have the hardware to test this with, maybe I can use some other firmware to test his or some modules

Itxaka avatar Sep 02 '25 12:09 Itxaka

ok I got a tp-link usb network wifi card that requires firmware.

  • Built a ubuntu 24.04 uki image with init.
  • Created a dockerfile that just uninstalls linux-firmware from the generated image
  • Booted and installed the uki iso like normal, everything seems to be fine
  • Passtrougth the usb dongle

results in failure as it has no firmware available:

[   48.874800] usb 1-4: new high-speed USB device number 3 using xhci_hcd
[   49.000990] usb 1-4: New USB device found, idVendor=2357, idProduct=011e, bcdDevice= 2.00
[   49.000995] usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   49.000996] usb 1-4: Product: 802.11ac WLAN Adapter 
[   49.000997] usb 1-4: Manufacturer: Realtek 
[   49.000998] usb 1-4: SerialNumber: 00e04c000001
[   49.065898] rtw_8821au 1-4:1.0: Direct firmware load for rtw88/rtw8821a_fw.bin failed with error -2
[   49.065904] rtw_8821au 1-4:1.0: failed to request firmware
[   49.077725] rtw_8821au 1-4:1.0: failed to load firmware
[   49.077933] rtw_8821au 1-4:1.0: failed to setup chip efuse info
[   49.078138] rtw_8821au 1-4:1.0: failed to setup chip information
[   49.078480] rtw_8821au 1-4:1.0: probe with driver rtw_8821au failed with error -22
[   49.078964] usbcore: registered new interface driver rtw_8821au

Device does not appear in the system, so the device could not be created.

Now I will try to pack the firmware into a systext and load it

Itxaka avatar Sep 04 '25 08:09 Itxaka

Firmware sysext generated with this Dockerfile:

FROM ubu-init:uki-slim
RUN apt-get update && apt-get install -y linux-firmware

Then built the image

docker build -t firmware:latest .
[+] Building 13.3s (6/6) FINISHED                                                        docker:default
 => [internal] load build definition from Dockerfile                                               0.0s
 => => transferring dockerfile: 116B                                                               0.0s
 => [internal] load metadata for docker.io/library/ubu-init:uki-slim                               0.0s
 => [internal] load .dockerignore                                                                  0.0s
 => => transferring context: 2B                                                                    0.0s
 => CACHED [1/2] FROM docker.io/library/ubu-init:uki-slim                                          0.0s
 => [2/2] RUN apt-get update && apt-get install -y linux-firmware                                 12.0s
 => exporting to image                                                                             1.3s 
 => => exporting layers                                                                            1.3s 
 => => writing image sha256:0b77ed7e37c663e7c88dff23750fe03d41827f194724e8dcb7f90744a92f7214       0.0s 
 => => naming to docker.io/library/firmware:latest    

And used AuroraBoot to generate a signed sysext:

docker run -v "$PWD":/build/ -v $PWD/tests/assets//keys/:/keys -v /var/run/docker.sock:/var/run/docker.sock --rm -ti quay.io/kairos/auroraboot sysext --private-key=/keys/db.key --certificate=/keys/db.pem --output /build firmware firmware:latest 

This generated the following sysext:

$ systemd-dissect firmware.sysext.raw
 File Name: firmware.sysext.raw
      Size: 529.3M
 Sec. Size: 512
     Arch.: x86-64

Image Name: firmware
Image UUID: 60f29b0d-f685-4878-b529-4ef35c3f1196
 sysext R.: ID=_any
            ARCHITECTURE=x86-64

    Use As: ✗ bootable system for UEFI
            ✗ bootable system for container
            ✗ portable service
            ✗ initrd
            ✓ sysext for system
            ✓ sysext for portable service
            ✗ sysext for initrd
            ✗ confext for system
            ✗ confext for portable service
            ✗ confext for initrd

RW DESIGNATOR      PARTITION UUID                       PARTITION LABEL        FSTYPE                AR>
ro root            bb540a0b-a9d7-56ba-fdcc-646e70c4185a root-x86-64            erofs                 x8>
ro root-verity     e1d5e711-8d33-60b0-36e1-33d6b067a428 root-x86-64-verity     DM_verity_hash        x8>
ro root-verity-sig ba282899-118d-4b1c-ba8c-5e1af8e37c81 root-x86-64-verity-sig verity_hash_signature x8>

Itxaka avatar Sep 04 '25 08:09 Itxaka

installed the sysext with kairos-agent

root@localhost:~# kairos-agent sysext install file:/tmp/firmware.sysext.raw 
2025-09-04T08:54:26Z INF Kairos Agent version=v2.24.2
2025-09-04T08:54:26Z INF creating a runtime
2025-09-04T08:54:26Z INF detecting boot state
2025-09-04T08:54:26Z INF Boot Mode boot_mode=active_boot
2025-09-04T08:54:26Z INF Boot in uki mode result=true
2025-09-04T08:54:27Z INF System extension file:/tmp/firmware.sysext.raw installed

And enabled it for active:

root@localhost:~# kairos-agent sysext list
2025-09-04T08:54:51Z INF Kairos Agent version=v2.24.2
2025-09-04T08:54:51Z INF creating a runtime
2025-09-04T08:54:51Z INF detecting boot state
2025-09-04T08:54:51Z INF Boot Mode boot_mode=active_boot
2025-09-04T08:54:51Z INF Boot in uki mode result=true
2025-09-04T08:54:51Z INF action.SysExtension{
  Name: "firmware.sysext.raw",
  Location: "/var/lib/kairos/extensions/firmware.sysext.raw",
}
root@localhost:~# kairos-agent sysext enable --now --active firmware
2025-09-04T08:56:50Z INF Kairos Agent version=v2.24.2
2025-09-04T08:56:50Z INF creating a runtime
2025-09-04T08:56:50Z INF detecting boot state
2025-09-04T08:56:50Z INF Boot Mode boot_mode=active_boot
2025-09-04T08:56:50Z INF Boot in uki mode result=true
2025-09-04T08:56:50Z INF System extension firmware.sysext.raw enabled in active
2025-09-04T08:56:50Z INF System extension firmware.sysext.raw enabled in /run/extensions
2025-09-04T08:56:50Z INF System extension firmware.sysext.raw merged by systemd-sysext
root@localhost:~# systemd-sysext
HIERARCHY      EXTENSIONS SINCE                      
/usr/bin       none       -                          
/usr/include   none       -                          
/usr/lib       firmware   Thu 2025-09-04 08:56:50 UTC
/usr/local/bin none       -                          
/usr/local/lib none       -                          
/usr/sbin      none       -                          
/usr/share     firmware   Thu 2025-09-04 08:56:50 UTC
/usr/src       none   

Itxaka avatar Sep 04 '25 08:09 Itxaka

Now connecting the wifi usb card results in proper firmware loading:

[ 1753.117758] usb 1-4: new high-speed USB device number 4 using xhci_hcd
[ 1753.242436] usb 1-4: New USB device found, idVendor=2357, idProduct=011e, bcdDevice= 2.00
[ 1753.242443] usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1753.242444] usb 1-4: Product: 802.11ac WLAN Adapter 
[ 1753.242446] usb 1-4: Manufacturer: Realtek 
[ 1753.242447] usb 1-4: SerialNumber: 00e04c000001
[ 1753.244128] rtw_8821au 1-4:1.0: Firmware version 42.4.0, H2C version 0
[ 1753.893615] rtw_8821au 1-4:1.0 wlxec750c0937b1: renamed from wlan0

root@localhost:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:73:e7:4c brd ff:ff:ff:ff:ff:ff
3: wlxec750c0937b1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ec:75:0c:09:37:b1 brd ff:ff:ff:ff:ff:ff

Itxaka avatar Sep 04 '25 08:09 Itxaka

Note that i did NOT need to change anything for sysext, we already have several hyerarchies added tot he default in which /usr/lib is included, so as long as the firmware package dumps all the firmware in there, it should work out of the box.

Itxaka avatar Sep 04 '25 09:09 Itxaka

The only issue I can see with this is that the timing is not the best. IIRC, we enable sysext during initramfs, which is at the end of the process, so if you have installed the system on a disk that requires firmware, it wont work, or you need network before the initramfs phase and it needs firmware it wont also work, but I dont think thats very problematic.

Itxaka avatar Sep 04 '25 09:09 Itxaka

The only problem I see is that the firmware will try to load the devices ASAP and wont find the firmware in there. So when the firmware is available it would need to reload them somehow. You can see the difference in time in the logs between the kernel trying to load the firmware and the sysext enabling the firmware:

Sep 04 09:19:13 localhost kernel: rtw_8821au 1-4:1.0: Direct firmware load for rtw88/rtw8821a_fw.bin failed with error -2
Sep 04 09:19:13 localhost kernel: rtw_8821au 1-4:1.0: failed to request firmware
Sep 04 09:19:13 localhost kernel: rtw_8821au 1-4:1.0: failed to load firmware
Sep 04 09:19:13 localhost kernel: rtw_8821au 1-4:1.0: failed to setup chip efuse info
Sep 04 09:19:13 localhost kernel: rtw_8821au 1-4:1.0: failed to setup chip information
Sep 04 09:19:13 localhost kernel: rtw_8821au 1-4:1.0: probe with driver rtw_8821au failed with error -22
Sep 04 09:19:13 localhost kernel: usbcore: registered new interface driver rtw_8821au

Sep 04 09:19:14 localhost systemd[1]: Starting systemd-sysext.service - Merge System Extension Images into /usr/ and /opt/...

Im not sure if the kernel drivers/firmware can somehow be reloaded?

Itxaka avatar Sep 04 '25 09:09 Itxaka

I fixed this by unbinding and binding the device and it seemed to work:

dev=""
for d in /sys/bus/usb/devices/*-*; do
  prod=$(cat "$d/product" 2>/dev/null || true)
  manu=$(cat "$d/manufacturer" 2>/dev/null || true)
  if printf '%s %s\n' "$manu" "$prod" | grep -Ei '802\.11|wireless|wifi|WLAN' >/dev/null; then
    dev=$(basename "$d"); break
  fi
done
[ -n "$dev" ] || exit 1
echo "$dev" > /sys/bus/usb/drivers/usb/unbind
echo "$dev" > /sys/bus/usb/drivers/usb/bind

This resulted in the dongle being "reconnected" and firmware loaded

[ 1546.082502] usb 1-4: reset high-speed USB device number 3 using xhci_hcd
[ 1549.069621] rtw_8821au 1-4:1.0: Firmware version 42.4.0, H2C version 0
[ 1549.757463] rtw_8821au 1-4:1.0 wlxec750c0937b1: renamed from wlan0

So potentially you can do this as part of initramfs stages to reload and have it available.

You could also get away with a service fie for systemd:

/usr/local/bin/replug-wifi.sh

#!/bin/sh
set -eu

log() {
    echo "[replug-wifi] $*" >&2
}

log "Starting replug script..."

dev=""
for d in /sys/bus/usb/devices/*-*; do
    [ -d "$d" ] || continue
    prod=$(cat "$d/product" 2>/dev/null || true)
    manu=$(cat "$d/manufacturer" 2>/dev/null || true)
    log "Checking $d (manu='$manu' prod='$prod')"
    if printf '%s %s\n' "$manu" "$prod" | grep -Ei '802\.11|wireless|wifi|WLAN' >/dev/null; then
        dev=$(basename "$d")
        log "Match found: $dev"
        break
    fi
done

if [ -z "$dev" ]; then
    log "No candidate Wi-Fi USB device found"
    exit 1
fi

if [ ! -w /sys/bus/usb/drivers/usb/unbind ]; then
    log "Cannot write to /sys/bus/usb/drivers/usb/unbind (permissions?)"
    exit 1
fi

log "Unbinding $dev..."
echo "$dev" > /sys/bus/usb/drivers/usb/unbind 2>/dev/null || log "Failed to unbind $dev"
sleep 1
log "Rebinding $dev..."
echo "$dev" > /sys/bus/usb/drivers/usb/bind 2>/dev/null || log "Failed to bind $dev"

log "Done."
exit 0

wifi.service

[Unit]
Description=Re-enumerate USB Wi-Fi when firmware appears
After=systemd-sysext.service
# Adjust if your sysext drops a different filename or path
ConditionPathExists=/usr/lib/firmware/rtw88/rtw8821a_fw.bin.zst

[Service]
Type=oneshot
ExecStart=/usr/local/bin/replug-wifi.sh

[Install]
WantedBy=multi-user.target

wifi.path (the trigger for the service!)

[Unit]
Description=Watch for Wi-Fi firmware arrival

[Path]
PathExists=/usr/lib/firmware/rtw88/rtw8821a_fw.bin.zst
Unit=wifi.service

[Install]
WantedBy=multi-user.target

Itxaka avatar Sep 04 '25 09:09 Itxaka

Fixed service and unit sorry:

/usr/local/bin/wifi-once.sh

#!/bin/sh
set -eu

fw=/usr/lib/firmware/rtw88/rtw8821a_fw.bin.zst
echo "[wifi] waiting for: $fw" >&2
while [ ! -e "$fw" ]; do sleep 1; done
echo "[wifi] firmware present; re-enumerating" >&2

# find a likely Wi-Fi USB device
dev=""
for d in /sys/bus/usb/devices/*-*; do
  [ -d "$d" ] || continue
  prod=$(cat "$d/product" 2>/dev/null || true)
  manu=$(cat "$d/manufacturer" 2>/dev/null || true)
  if printf '%s %s\n' "$manu" "$prod" | grep -Ei '802\.11|wireless|wifi|WLAN' >/dev/null; then
    dev=$(basename "$d"); echo "[wifi] candidate: $dev ($manu $prod)" >&2; break
  fi
done

if [ -n "${dev:-}" ] && [ -w /sys/bus/usb/drivers/usb/unbind ] && [ -w /sys/bus/usb/drivers/usb/bind ]; then
  echo "[wifi] unbind/bind: $dev" >&2
  echo "$dev" > /sys/bus/usb/drivers/usb/unbind 2>/dev/null || true
  sleep 1
  echo "$dev" > /sys/bus/usb/drivers/usb/bind 2>/dev/null || true
else
  echo "[wifi] no candidate or cannot write to usb core; skipping bind cycle" >&2
fi

# nudge the driver (harmless if not loaded)
modprobe -r rtw_8821au 2>/dev/null || true
modprobe rtw_8821au 2>/dev/null || true

echo "[wifi] done" >&2

/etc/systemd/system/wifi.service

[Unit]
Description=Re-enumerate USB Wi-Fi once firmware exists
After=systemd-sysext.service

[Service]
Type=oneshot
TimeoutStartSec=2min
ExecStart=/usr/local/bin/wifi-once.sh
StandardOutput=journal
StandardError=journal
Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

[Install]
WantedBy=multi-user.target

This results in the dongle being brough up oince the firmware is in place:

root@localhost:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:73:e7:4c brd ff:ff:ff:ff:ff:ff
3: wlxec750c0937b1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ec:75:0c:09:37:b1 brd ff:ff:ff:ff:ff:ff
root@localhost:~# systemctl status wifi.service
○ wifi.service - Re-enumerate USB Wi-Fi once firmware exists
     Loaded: loaded (/etc/systemd/system/wifi.service; enabled; preset: enabled)
     Active: inactive (dead) since Thu 2025-09-04 10:33:50 UTC; 34s ago
    Process: 1859 ExecStart=/usr/local/bin/wifi-once.sh (code=exited, status=0/SUCCESS)
   Main PID: 1859 (code=exited, status=0/SUCCESS)
        CPU: 170ms

Sep 04 10:33:49 localhost systemd[1]: Starting wifi.service - Re-enumerate USB Wi-Fi once firmware exis>
Sep 04 10:33:49 localhost wifi-once.sh[1859]: [wifi] waiting for: /usr/lib/firmware/rtw88/rtw8821a_fw.b>
Sep 04 10:33:49 localhost wifi-once.sh[1859]: [wifi] firmware present; re-enumerating
Sep 04 10:33:49 localhost wifi-once.sh[1859]: [wifi] candidate: 1-4 (Realtek  802.11ac WLAN Adapter )
Sep 04 10:33:49 localhost wifi-once.sh[1859]: [wifi] unbind/bind: 1-4
Sep 04 10:33:50 localhost wifi-once.sh[1859]: [wifi] done
Sep 04 10:33:50 localhost systemd[1]: wifi.service: Deactivated successfully.
Sep 04 10:33:50 localhost systemd[1]: Finished wifi.service - Re-enumerate USB Wi-Fi once firmware exis>

So same thing can be used by anything else. I wonder if gpu stuff would be different

Itxaka avatar Sep 04 '25 10:09 Itxaka

Also here is the difference in ISO images between one with firmware bundled and one without:

366M  kairos-ubuntu-24.04-core-amd64-generic-v3.5.0-itxaka-uki-slim.iso
868M kairos-ubuntu-24.04-core-amd64-generic-v3.5.0-itxaka-uki.iso

The active.efi is 316Mb in the slim one The active.efi is 818Mb in the normal one

Huge diff here.

Itxaka avatar Sep 04 '25 10:09 Itxaka

I think this is mostly done, maybe it need some docs to show an example on how to do it with the full firmware package like I did and show the workarounds for devices like usb plug stuff and so on.

Itxaka avatar Sep 04 '25 10:09 Itxaka

I figured I would add how we do this, inside the Dockerfile after installing firmware we remove the bits we don't need. This prevents the race condition parts described above and slims down the image. We have been playing with them in the sys-ext too, but most of these needed to be loaded early.

# Remove all files in /usr/lib/firmware except the specified ones in mediatek, i915, intel, amd, and their subdirectories
RUN find /usr/lib/firmware \
    ! -path "/usr/lib/firmware" \
    ! -path "/usr/lib/firmware/regulatory*" \
    ! -path "/usr/lib/firmware/iwlwifi-ty-a0-gf-a0*" \
    ! -path "/usr/lib/firmware/mediatek" \
    ! -path "/usr/lib/firmware/mediatek/*" \
    ! -path "/usr/lib/firmware/i915" \
    ! -path "/usr/lib/firmware/i915/*" \
    ! -path "/usr/lib/firmware/intel-ucode" \
    ! -path "/usr/lib/firmware/intel-ucode/*" \
    ! -path "/usr/lib/firmware/intel" \
    ! -path "/usr/lib/firmware/intel/*" \
    ! -path "/usr/lib/firmware/mediatek/*MT7922*.bin" \
    ! -path "/usr/lib/firmware/mediatek/*MT7922*.bin.zst" \
    -type f -exec rm -f {} \;

bencorrado avatar Sep 16 '25 07:09 bencorrado