talos icon indicating copy to clipboard operation
talos copied to clipboard

Avoid using --insecure during bootstrap

Open shellwhale opened this issue 1 year ago • 8 comments

Hello.

From my understanding the initial configuration, the bootstrap phase, is not authenticated. This means this is a Trust On First Use (TOFU) authentification scheme, which is vulnerable to Man-in-the-middle attacks.

Is there a way to embed a certificate inside of the image, before the initial configuration that happens over the network? That way we could get rid of the --insecure flag.

shellwhale avatar Oct 18 '24 07:10 shellwhale

bootstrap API requires proper certificate, initial apply-config can only use --insecure

frezbo avatar Oct 18 '24 07:10 frezbo

bootstrap API requires proper certificate, initial apply-config can only use --insecure

Yes that's why I'm asking, it does not feel appropriate to use an insecure authentification scheme right from the start.

Again, can't I generate myself a certificate, embed it in the image somehow, maybe as a step from factory.talos.dev or using the imager?

shellwhale avatar Oct 18 '24 08:10 shellwhale

Anyone with network access can configure the machine, or impersonate it, isn't that an issue?

shellwhale avatar Oct 18 '24 08:10 shellwhale

There,

https://www.talos.dev/v1.8/learn-more/image-factory/#schematics https://www.talos.dev/v1.8/talos-guides/install/boot-assets/#image-factory

I feel this is where there should be an explanation on how to setup a custom certificate, at the image level, not at the network level. But I can't find any.

shellwhale avatar Oct 18 '24 08:10 shellwhale

Is there a way to embed a certificate inside of the image, before the initial configuration that happens over the network? That way we could get rid of the --insecure flag.

If you follow console output of Talos, you would notice that it prints the fingerprint of its own certificate which you can use with --insecure apply config. This is already implemented.

You can submit machine configuration to Talos via many other methods as well, which have their own pros and cons, but if you worry about man-in-the-middle specifically, use the fingerprint shown in the console.

smira avatar Oct 18 '24 11:10 smira

@smira but isn't the CA the same for everyone?

shellwhale avatar Oct 19 '24 12:10 shellwhale

Is there a way to embed a certificate inside of the image, before the initial configuration that happens over the network? That way we could get rid of the --insecure flag.

If you follow console output of Talos, you would notice that it prints the fingerprint of its own certificate which you can use with --insecure apply config. This is already implemented.

So this means that the CLI client trusts server certificates signed by some certificate authority, the private key associated with that CA must be stored on the Talos image itself if the signing does happen at the server boot time.

Is that what happens? @smira

shellwhale avatar Oct 21 '24 09:10 shellwhale

@smira but isn't the CA the same for everyone?

There is no CA, Talos without machine configuration generates a fresh self-signed certificate to run what we call "maintenance service API". The fingerprint is printed to the console, so you can use that on client side (talosctl) to ensure you're talking to the machine you intend to send machine configuration to.

So we're talking here about delivering machine configuration to a Talos node, there are several options, not all of them are same, but here is the list:

  • talosctl apply-config --insecure (machine boots with no configuration, and enters maintenance mode)
  • user-data (clouds), talos.config= (metal); talos.config= can be further protected with OIDC flow
  • special filesystem with machine config (metal)
  • Talos offers SideroLink which acts as a reverse tunnel, and offers additional layer of protection on top, in that case maintenance service runs over SideroLink tunnel, and it's never exposed over the Internet (that's how Omni works)

So there are many options, each with its own set of pros and cons, better UX or better security. There's a question of running untrusted workloads and whether workloads can access the same config source (e.g. for cloud user-data).

So there is no answer that fits all cases, it's better to ask a specific question - how do I deliver the machine configuration to the node given the requirements that I have.

smira avatar Oct 21 '24 11:10 smira

The fingerprint on the console solves the MITM problem. But as far as I can tell, while Talos is in maintenance mode, anyone can it to install anything, including malware, without authentication, so MITM isn't even necessary :)

TomyLobo avatar Dec 10 '24 14:12 TomyLobo

If you are worried about someone on your network sending a malicious machine config to a node in maintenance mode then you should use one of the alternative methods for configuring the machine or use Omni with siderolink which creates a secure wireguard connection and doesn't accept insecure connections.

rothgar avatar Apr 22 '25 04:04 rothgar