talos `talosctl upgrade --image some:image` does not re-pull the image

`talosctl upgrade --image some:image` does not re-pull the image

Open utkuozdemir opened this issue 3 years ago • 1 comments

Bug Report

Description

Run talosctl upgrade --image some:image with an invalid installer image.
Fix the image and push it with the same tag docker push some:image
Run talosctl upgrade --image some:image again, it will not re-pull the image and keep failing.

We can introduce a flag to the upgrade command like --force-pull to enforce pulling of image.

Logs

172.20.0.2: [talos] upgrade request received: preserve true, staged false, force false
172.20.0.2: [talos] validating "ghcr.io/utkuozdemir/talos-installer:test-break"
172.20.0.2: machined Unknown [/machine.MachineService/Upgrade] 2.473929476s unary error validating installer image "ghcr.io/utkuozdemir/talos-installer:test-break": failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/bin/installer": stat /bin/installer: no such file or directory: unknown (:authority=localhost;content-type=application/grpc;proxyfrom=172.20.0.2,172.20.0.3,172.20.0.4;talos-role=os:admin;user-agent=grpc-go/1.47.0)
....
....
....
172.20.0.2: [talos] upgrade request received: preserve true, staged false, force false
172.20.0.2: [talos] validating "ghcr.io/utkuozdemir/talos-installer:test-break"
172.20.0.2: machined Unknown [/machine.MachineService/Upgrade] 63.348966ms unary error validating installer image "ghcr.io/utkuozdemir/talos-installer:test-break": failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/bin/installer": stat /bin/installer: no such file or directory: unknown (:authority=localhost;content-type=application/grpc;proxyfrom=172.20.0.2,172.20.0.3,172.20.0.4;talos-role=os:admin;user-agent=grpc-go/1.47.0)

Jun 15 '22 18:06 utkuozdemir

The root cause is that image is pulled and cached in the system containerd in memory (in tmpfs).

So rebooting a node is enough as a workaround.

The proper fix is to pull the image always while processing the upgrade API request, but use the cached image when running the actual upgrade.

Jun 21 '22 15:06 smira

talos talos copied to clipboard

`talosctl upgrade --image some:image` does not re-pull the image

Bug Report

Description

Logs

talos
talos copied to clipboard