tnu icon indicating copy to clipboard operation
tnu copied to clipboard

Clarification regarding Image Factory Install image

Open davralin opened this issue 9 months ago • 5 comments

Talos Node Updater will only work on nodes that have an Image Factory install image in their machine config (see Config.machine.install, Boot Assets, and Image Factory).

So, that is from the README, what is the limitations for this?

Is it looking spesifically for factory.talos.dev, or something else?

I'm experimenting with some custom installer-images, so I have some installer-images posted in a local repo, so wondering if that's going to be an issue with this or not.

davralin avatar Mar 09 '25 17:03 davralin

It's not looking for factory.talos.dev. Indeed, I am running a private image factory hosting custom Talos builds. What it is looking for is an Image Factory URL in each node's machine config, as documented. What that means is a URL of the form <image factory domain>/installer(-secureboot)?/<schematic ID>:<version tag>. From that, it will parse/extract the schematic ID, and generate a new installer URL by updating the version tag.

For example, from one of my nodes:

> talosctl -n kantai1 get mc -oyaml | yq .spec | yq .machine.install.image
tif.etincelle.cloud/installer-secureboot/d23b1d9b4724da9b63e7c06c4f28e560c26c8161b131b058fd7dae0f46f02211:v1.9.4

If this is too restrictive for you, I can probably add some logic to make the schematic ID support optional.

jfroy avatar Mar 10 '25 04:03 jfroy

Right, so the way I have it now is oci.example.com/talos/installer:1.9.4-TYPE which doesn't include the schematic ID at all.

The type here is used to separate between the different images, so I can have a 1.9.4-image that is the default, with includes qemu-agent, and I can have a hardware-tag for building stuff spesific for different hardware.

I'm just testing this locally in my homelab for now, but at work we have to use internal registries, so I have to make it work there somehow.

We haven't seen the need to include the schematic ID internally, what were the though-process for you to include that internally?

davralin avatar Mar 10 '25 06:03 davralin

System extensions is the way to add hardware support on Talos (and other system modifications of course, like alternative container engines). And the complete set of system extensions and kernel arguments can be represented by a single SideroLabs Image Factory schematic ID. So it made sense to adopt that instead of some other alternative scheme. The Talos control plane software also adds each node's actual current schematic ID as a label. I built tnu around the concept of schematic ID as a result because I can compare the desired/wanted schematic ID with the current schematic ID and trigger node updates on schematic ID changes, as well as version changes.

Take a look at https://github.com/jfroy/talos-boot-assets?tab=readme-ov-file#running-image-factory which gives a quick explainer for how I run my own image factory. That repo also shows how I built talos from source, build custom extensions, and lightly modify image factory to offer those extensions along with the official SideroLabs ones.

jfroy avatar Mar 10 '25 15:03 jfroy

Yes I know, we do have something that pulls the latest image from factory, and then pushes it to an internal registry.

We are currently using talos/installer:1.9,4 as the default internally, and then talos/installer:1.9.4-H100 or other specialized/targeted naming to separate the different types we need.

That means we can keep the different servers targeted to it's type, and then just pull a higher version, without also including the schematic ID, which we don't really need to know so much about when we get to the 70th similar server, we can just keep pointing internal servers to the "same" tag, even if we pull from a different schematic ID upstream.

Now, I'm not a Go-person, but am I correct in thinking it picks up the installer-image from here, straight using machineConfig with go-modules straight from SideroLabs and then it finds the imageTag and schematicID in the url. Meaning if we did something weird with the url, it wouldn't matter as long as the schematicID is present, and the tag can be enumerated - so oci.example.com/packages/custom-talos/installer/<SCHEMATIC-ID>/installer:1.94 would also work great - while perhaps our current method with [...]/installer:1.94-H100 would not work?

I'm a little unsure about the meaning behind this block, but perhaps it will be better for me to just try it and see how it works.

davralin avatar Mar 12 '25 06:03 davralin

Yes I know, we do have something that pulls the latest image from factory, and then pushes it to an internal registry.

We are currently using talos/installer:1.9,4 as the default internally, and then talos/installer:1.9.4-H100 or other specialized/targeted naming to separate the different types we need.

I remain curious why you are implementing / designing an alternative scheme using tag suffixes to have different installer images based on the same Talos version, instead of using SideroLabs schematic IDs. Do you not pull the variants from an image factory deployment?

That means we can keep the different servers targeted to it's type, and then just pull a higher version, without also including the schematic ID, which we don't really need to know so much about when we get to the 70th similar server, we can just keep pointing internal servers to the "same" tag, even if we pull from a different schematic ID upstream.

What I understand from the above is you have several internal registries at different URLs (e.g. machine-type1.example.com/talos/installer:v1.0.0, machine-type2.example.com/talos/installer:v1.0.0, etc.), serving a different installer image variant, using the same tags.

Or a single registry using different image namespaces (e.g. oci.example.com/talos/type1/installer:v1.0.0, oci.example.com/talos/type2/installer:v1.0.0, etc.)

Now, I'm not a Go-person, but am I correct in thinking it picks up the installer-image from here, straight using machineConfig with go-modules straight from SideroLabs and then it finds the imageTag and schematicID in the url.

It reads the machine config for the target node using the Talos API and extracts the installer container image reference from it. It then parses that reference for the tag and schematic ID. These are considered the desired tag (i.e. version) and schematic ID.

It also reads the schematic ID node annotation (getSchematicAnnotation) and the node's Talos version using the Talos API. (https://github.com/jfroy/tnu/blob/main/tnu.go#L113). Those are considered the current or actual version and schematic ID.

Meaning if we did something weird with the url, it wouldn't matter as long as the schematicID is present, and the tag can be enumerated - so oci.example.com/packages/custom-talos/installer/<SCHEMATIC-ID>/installer:1.94 would also work great - while perhaps our current method with [...]/installer:1.94-H100 would not work?

The parsing logic will work, indeed, but the resulting "schematic ID" will not match the node's schematic ID annotation and tnu will endlessly issue node updates. It will effectively DDOS your cluster.

I'm a little unsure about the meaning behind this block, but perhaps it will be better for me to just try it and see how it works.

That logic checks if the desired tag (i.e. version) and schematic ID match the actual or current version and schematic ID. If both match, the node is considered up to date and tnu exits cleanly. Otherwise, the node requires an update and tnu will issue the update Talos API request.

jfroy avatar Mar 12 '25 19:03 jfroy