terraform-provider-talos
terraform-provider-talos copied to clipboard
Improve (documentation?) of `talos_machine_configuration_apply`
Say you have unconfigured Talos nodes and want to configure them using talos_machine_configuration_apply.
You read the Talos documentation and you understand that endpoint is the "entrypoint" (probably your control plane) and node is where you want to perform the operation on.
You might try something like this. Note that talos_machine_configuration_apply.worker.endpoint is set to the public IP of the control plane.
resource "talos_machine_secrets" "this" {}
data "talos_machine_configuration" "controlplane" {
cluster_name = "example-cluster"
machine_type = "controlplane"
cluster_endpoint = "https://${hcloud_server.controlplane.ipv4_address}:6443"
machine_secrets = talos_machine_secrets.this.machine_secrets
}
data "talos_machine_configuration" "worker" {
cluster_name = "example-cluster"
machine_type = "worker"
cluster_endpoint = "https://${hcloud_server.controlplane.ipv4_address}:6443"
machine_secrets = talos_machine_secrets.this.machine_secrets
}
data "talos_client_configuration" "this" {
cluster_name = "example-cluster"
client_configuration = talos_machine_secrets.this.client_configuration
nodes = ["10.0.0.101", "10.0.0.102"]
endpoints = [hcloud_server.controlplane.ipv4_address]
}
resource "talos_machine_configuration_apply" "controlplane" {
client_configuration = talos_machine_secrets.this.client_configuration
machine_configuration_input = data.talos_machine_configuration.controlplane.machine_configuration
node = "10.0.0.101"
endpoint = hcloud_server.controlplane.ipv4_address
}
resource "talos_machine_configuration_apply" "worker" {
client_configuration = talos_machine_secrets.this.client_configuration
machine_configuration_input = data.talos_machine_configuration.worker.machine_configuration
node = "10.0.0.102"
# This is the mistake, because it's the public IP of the control plane, not the node
endpoint = hcloud_server.controlplane.ipv4_address
}
resource "talos_machine_bootstrap" "this" {
endpoint = hcloud_server.controlplane.ipv4_address
node = "10.0.0.101"
client_configuration = talos_machine_secrets.this.client_configuration
}
This will successfully configure the control plane, but not the worker. When you think about it, it makes sense (we'll get to this). But before you realize this, you read the documentation, which says:
node (String) The name of the node to bootstrap
You might ask: "what do they mean by 'name'?" You also read:
endpoint (String) The endpoint of the machine to bootstrap
You might ask: "Why 'bootstrap'? Sure, I want to bootstrap, but isn't this the 'apply' resource that can also be used after bootstrapping? Was this copy-pasted?"
You might even have the situation (depending on the execution order of your talos_machine_configuration_apply) that your control plane is initialized as type "worker" and you end up reading #23. Now you don't know whether to doubt yourself or this provider 🙂
Finally, you realize that the target (worker) is not yet part of the cluster, so the control plane cannot communicate with it. Therefore, when using talos_machine_configuration_apply on an uninitialized node, endpoint needs to be the public IP of your target.
It would be great if the documentation could be more clear about this (and maybe other pitfalls - like how it behaves with endpoint and node of talos_client_configuration). Or maybe something can be changed in the way talos_machine_configuration_apply works.
Also, to me, it's confusing that talos_machine_bootstrap also has endpoint and node. In which scenario is node not the same as endpoint?
Wow, so that's why my changes were not being picked up by the workers. At this point it is disappointing to see how little thought was put into this provider and how barebones it feels.
Yeah, the docs for v0.6 need some more details. Other examples:
machine_configuration_apply:
- apply_mode - string: What are the options here and what is the default?
- timeouts - struct: Most of the info in that struct just say "A string that can be parsed as a duration" instead of what the timeout is for and what the default is
cluster_kubeconfig
- timeouts - struct: Same as above
* apply_mode - string: What are the options here and what is the default?
I haven't had a use case for the apply_mode attribute yet, but I would assume it's the same as the --mode flag in talosctl apply-config:
https://www.talos.dev/v1.8/reference/cli/#options
Reading through this I am am still a little confused. I am using the the external address - for both. Still it's the apply on the worker that does not complete for me.
talos_machine_configuration_apply.worker[0]: Still creating...
What am I missing here?
resource "talos_machine_configuration_apply" "controlplane" {
count = length(hcloud_server.control_plane)
client_configuration = talos_machine_secrets.this.client_configuration
machine_configuration_input = data.talos_machine_configuration.controlplane.machine_configuration
node = hcloud_server.control_plane[count.index].ipv4_address
endpoint = hcloud_server.control_plane[0].ipv4_address
}
resource "talos_machine_configuration_apply" "worker" {
count = length(hcloud_server.workers)
client_configuration = talos_machine_secrets.this.client_configuration
machine_configuration_input = data.talos_machine_configuration.worker.machine_configuration
node = hcloud_server.workers[count.index].ipv4_address
endpoint = hcloud_server.control_plane[0].ipv4_address
}
resource "talos_machine_bootstrap" "this" {
depends_on = [
hcloud_server.control_plane
]
client_configuration = talos_machine_secrets.this.client_configuration
endpoint = hcloud_server.control_plane[0].ipv4_address
node = hcloud_server.control_plane[0].ipv4_address
}
resource "talos_cluster_kubeconfig" "this" {
depends_on = [
talos_machine_bootstrap.this
]
client_configuration = talos_machine_secrets.this.client_configuration
node = hcloud_server.control_plane[0].ipv4_address
}
resource "local_file" "talosconfig" {
depends_on = [
talos_machine_bootstrap.this
]
content = data.talos_client_configuration.this.talos_config
filename = "${path.module}/../../.configs/${var.cluster_name}/talosconfig"
}
resource "local_file" "kubeconfig" {
depends_on = [
talos_cluster_kubeconfig.this
]
content = talos_cluster_kubeconfig.this.kubeconfig_raw
filename = "${path.module}/../../.configs/${var.cluster_name}/kubeconfig"
}
@tcurdt It looks like you're doing the exact same mistake I did. I updated my example to make it a bit more clear.
resource "talos_machine_configuration_apply" "worker" {
...
endpoint = hcloud_server.control_plane[0].ipv4_address
}
should be
resource "talos_machine_configuration_apply" "worker" {
...
endpoint = hcloud_server.workers[count.index].ipv4_address
}
Because (the way I understand it): if endpoint is the IP of the control plane, you're sending the command to the control plane. But the control plane can't forward it to your worker, because the worker has not yet joined the cluster, so they can't communicate.
Therefore, you need to command your worker directly, which is what you achieve by setting endpoint to the worker's public IP.
I hope this helps.
@micheljung indeed it did ... thanks for pointing me at it
this is really a little weird